Someone drops a CSV of purchase orders on your desk and needs a spend summary by Monday. The skill here isn't the chart — it's knowing whether to trust it.
You have an export of Q1 purchase orders — one row per line item, with vendor, category, quantity, unit price, and line total. It came out of someone's accounting system and it is, like every real export, a little broken. You want: total spend, spend by category, top vendors, and one or two honest findings — a short report a manager can act on.
This is the daily-grind task Claude is quietly best at: parse a file, clean it, aggregate it, chart it, summarize it. No physics here. The twist is that the analysis has no closed-form answer to check against — so we need a different oracle.
Hand Claude the CSV and ask for a spend report. It parses, groups, totals, charts, and drafts the prose in one pass — the work that used to be an afternoon of pivot-table wrestling. Your job moves up a level: not making the report, but auditing it. A spreadsheet that adds up is not a correct analysis.
Download the dataset, hand it to Claude, and ask for: total spend, spend by category and by vendor, the month-over-month trend, and the three biggest line items. Then — this is the actual skill — demand that it reconcile the numbers before you believe them.
In the physics problems on this site, a wrong solver announces itself: energy isn't conserved, the closed form doesn't match. Data analysis has the same property if you set it up right. The oracle here is reconciliation — numbers that must agree, agreeing. The panel below runs the live dataset. It starts raw and broken; toggle the cleaning steps and watch the checks resolve.
The instructive one is the last check. The largest line in the raw data — ten RF connectors at $4,500 each — is a 10× price typo (they're $450). It passes the arithmetic check perfectly: 10 × $4,500 does equal $45,000. Every obvious test is green. But it single-handedly makes Electronics look like your top spend category, and the moment you ask "does that finding survive removing the one biggest line?" — the robustness rung — the answer flips to Machined Parts. That's the analysis equivalent of a solver that conserves energy and is still wrong.
The order matters: dedupe, normalize text keys (vendor names with inconsistent spelling won't group), handle blanks explicitly (decide whether they're dropped or bucketed as "Uncategorized" — don't let the tool decide silently). Only then aggregate.
Ask Claude to total the same money two independent ways (by category and by vendor) and confirm they match the raw sum. This one check catches the most common and most invisible bug: a group-by that quietly excludes nulls. If the two breakdowns disagree, the gap is the dropped rows.
The arithmetic check (line_total = qty × price) catches sloppy data entry, but it cannot catch a wrong number that's internally consistent — a price with an extra zero still multiplies correctly. That class of error only falls to the robustness check: perturb the input (drop the biggest row, the newest month, the top vendor) and see whether the conclusion holds.
Two free levers worth setting. Turn the reasoning effort up
for the hard part — Claude Code's /effort (see
Feed it documents for the
model and effort controls); a transcription wants it low, an analysis like
this one wants it high. And end your prompt with an explicit
self-check — “before you finish, reconcile every total against an independent count”
— which is exactly why the prompt above asks Claude to verify itself.
Naming the oracle is the highest-value line in the prompt. And keep the
expensive model's context light — route transcription and formatting
to cheaper tools (see Spend
tokens well).
The panel above is the worked solution: it parses the same CSV, runs all four reconciliation rungs, and regenerates the findings text live as you clean. With every toggle on, the report is honest: Machined Parts leads (driven by genuine CNC work), not Electronics (an artifact of one typo). Note that Rung 3 stays amber even when everything else is green — that's correct. You flag the two line-total mismatches for a human; you don't silently rewrite accounting records.
Everything here is plain browser JavaScript — no library, no backend. The point isn't the code; it's the discipline of making the numbers prove themselves before you put them in front of a manager.
You have an analysis you trust. Now you either explore it or deliver it — see explore in HTML, deliver in PDF for the one-time setup. For this dataset, the two prompts: