Spend tokens well: cheap for mechanics, expensive for judgment

The future is unbounded inference. Today you work to a budget — so spend it on judgment, not on mechanical reformatting. Everything else on this trunk is really one idea wearing different hats: tee up the expensive model with as little in its context as you can get away with.

The whole idea in one line: the costly model's attention is the scarce resource. Hand every mechanical job — reading documents in, formatting documents out, cleaning data, boilerplate — to a cheaper model or a deterministic tool, and save the expensive model for the reasoning and the verification only it should do.

The one rule

Sort every task into two piles: mechanical (a faithful transform with a right answer — transcribe, reformat, parse, rename) and judgment (a call that needs reasoning — is this right, which option wins, what does the physics say). Mechanical work is cheap work; it does not need your best model. Judgment is what you're actually paying the premium for.

The cleanest example is a PDF. Formatting a report to PDF should never touch Opus — that's pandoc + Typst, a deterministic typesetter that costs zero tokens. Reading a datasheet PDF in shouldn't either — that's a Sonnet worker transcribing to text. Opus's only job in the whole loop is the engineering call in the middle.

Who should do what

Task	Hand it to	Why
Transcribe a PDF / drawing to text	Sonnet worker	Mechanical reading; isolates the heavy read from your main context. how
Typeset Markdown → PDF	pandoc + Typst	Deterministic; costs zero tokens. how
Clean / parse a CSV	Sonnet, or a script	Mechanical; once Opus writes the script, reruns are free.
Bulk rename, boilerplate, reformat	a script	Repeatable transforms shouldn't spend tokens at all.
Judge whether the answer is right (the oracle)	Opus	Judgment — the thing you're paying the premium for.
Weigh tradeoffs, set weights, pick the design	Opus	Judgment.

Try it: route the task

A task lands. Pick the cheapest model × effort that can actually do it — then reveal the suitability heatmap. The goal isn't “use the best model,” it's “use the smallest one that still clears the bar.”

right-sized overpaying underpowered (fails)

Pick a cell.

right-sized 0 / 0

Keep the expensive model's context light

The trunk you climbed to get here is, read another way, a token-economy checklist:

Transcribe documents once to lean text, then work from that — don't re-read a PDF every turn. Feed it documents.
Run the big read in a second Sonnet worker so the raw document never enters Opus's window at all. Feed it documents.
Match the effort to the task: /effort low for transcription and formatting, high only where the reasoning is hard.
Write findings to NOTES.md so the next session reads a note instead of re-deriving — persistent memory for a few hundred tokens. notes/ folder.
Ship with deterministic tools — pandoc/Typst for PDF, a self-contained HTML file for exploring — not by asking the model to hand-format. Shipping the work.
Commit early and often so you can prune context and restart a session cheaply without losing work. Project + git.

You can also just tell the expensive model to economize up front:

PromptBefore we start: transcribe the datasheets with a Sonnet worker and keep only the extracted YAML in context, render any PDF output with pandoc + Typst (not by hand), and write running findings to NOTES.md. Spend your reasoning on the sizing decision and its verification, not on reading or formatting.

Why bother — for now

Honestly? Because today there's a budget. A lighter context is a longer effective session, a cheaper run, and — the part people miss — a sharper model: the expensive one reasons better when its window isn't clogged with page furniture and stale output it has to read past every turn. The day inference is unbounded, some of this stops mattering. Until then, routing the mechanical work elsewhere is free leverage, and it's most of what the other foundation pages are quietly teaching.

Where this fits

This is the capstone of the trunk and the habit every problem branch inherits: do the cheap things cheaply so the expensive model has room to do the one thing only it can. From here, pick a branch — the tech tree is back home.