← AI-pair numerics

Feed it documents: transcribe once, for tokens

A datasheet PDF dragged into every turn of a conversation is read, and paid for, every turn. Transcribe it once into clean text — Markdown for prose, YAML for structured data — and work from the lean copy. Cheaper, faster, and easier to check.

The whole idea in one line: raw PDFs are big, noisy, and re-read on every message. Spend one cheap pass converting the parts you need into a small text file in data/, then point Claude at that. Run the transcription in a second terminal on the cheaper Sonnet model, so the heavy read never touches your main session.

Why not just hand Claude the PDF

A PDF is a layout format, not a text format. Pulling one into context drags along page furniture, repeated headers, OCR noise, and — for a scanned drawing — image data. A 40-page datasheet can be tens of thousands of tokens, and if it sits in the conversation it's re-read on every single turn. You usually need about thirty numbers off it. Paying for the other 39 pages, repeatedly, is the waste.

Open a second worker on Sonnet

The cleanest way to keep that heavy read out of your main session isn't a clever prompt — it's a second window. Open another terminal, start Claude there on the cheaper Sonnet model, and give it one job: read the document and write a faithful transcription into data/. Your engineering session — on Opus — never sees the raw PDF.

This is where Claude Code's slash commands and model selection earn their keep. Launch the worker straight onto Sonnet:

claude --model sonnet
a second terminal

— or switch a session that's already open with the /model command:

/model sonnet
in Claude Code

Then turn the reasoning effort down. Transcription is a fidelity task, not a thinking task — you want the model copying digits, not pondering them, and cranking up effort just spends tokens and time for no extra accuracy:

/effort low
in the Sonnet worker

Two wins, and they're the whole point:

You don't need a programmatic multi-agent system for this — two terminals is the technique, and it's the right amount of machinery for a one-off read. Hand the worker its instructions:

PromptTranscribe data/bearing-6004.pdf into data/bearing-6004.yaml. Capture every dimension, load rating, and limiting speed as structured key/value fields with units. Transcribe faithfully — do not summarize or round — and flag anything illegible rather than guessing.

Make it a one-word command

Once you've done this a couple of times, save the instructions as a project skill — a custom slash command — so the worker pins the right model and effort for you. Drop a file at .claude/skills/transcribe-pdf/SKILL.md:

---
name: transcribe-pdf
description: Transcribe a PDF into faithful Markdown or YAML
model: sonnet
effort: low
disable-model-invocation: true
allowed-tools: Read, Write
---

Transcribe the document at $ARGUMENTS into data/.
Use YAML for specs and tables (named fields + units), Markdown for prose.
Transcribe faithfully — do not summarize or round. Carry a `source:`
reference (file + page). Flag anything illegible rather than guessing.
.claude/skills/transcribe-pdf/SKILL.md

Now the whole job is one line in the Sonnet window — model and effort already set, every time:

/transcribe-pdf data/bearing-6004.pdf
in the Sonnet worker

Markdown or YAML?

Pick the target format by what the document is:

Markdown (.md)YAML (.yaml)
Best forProse: standards, manuals, procedures, narrative reports.Structure: datasheet specs, BOMs, parameter tables, key/value data.
KeepsHeadings, paragraphs, lists, the occasional table.Named fields, units, nesting — machine-queryable.
You thenRead it, quote it, cite a section.Load it, compute on it, diff it across revisions.

A datasheet becomes a handful of fields you can compute against:

part: 6004
type: deep_groove_ball_bearing
bore_mm: 20
outer_dia_mm: 42
width_mm: 12
dynamic_load_C_kN: 9.36
static_load_C0_kN: 5.00
limiting_speed_rpm: 30000
source: { doc: bearing-6004.pdf, page: 1 }
data/bearing-6004.yaml

Now “compute L10 life for this bearing” reads three fields instead of re-parsing a PDF — and the source line tells you exactly where to go verify it. A spec manual, by contrast, stays prose:

# Acceptance Test Procedure — CW-2 Skid

## 3.2 Vibration limit
Overall velocity shall not exceed **4.5 mm/s RMS** measured at the
bearing housing per ISO 10816-3 for rigid mounting...
data/atp-section3.md
The transcription is a cache, not the truth. A model can drop a digit or misread a merged table cell. Keep the original PDF in data/, carry a source reference in the transcription, and spot-check any number you're betting the design on against the page it came from. Same habit as the rest of this site: the document is the oracle; the transcription is a convenience.

Keep the context lean

Transcribing is half the job; feeding the lean copy well is the other half. Three habits keep a long session sharp:

And the effort knob cuts both ways: transcription wants /effort low, but the analysis it feeds usually wants it high. Turn it up when the reasoning — not the reading — is the hard part.

What it saves

Rough arithmetic: a datasheet page is ~500–800 tokens of usable text, more if scanned. Forty pages re-read across a dozen turns is hundreds of thousands of token-reads. The YAML of the thirty numbers you actually use is a few hundred tokens, read once and cached in a tiny file. You're not just saving money — a lean, named dataset is something you can git diff when the vendor issues a revision, which a PDF blob never lets you do.

Where this fits

This is the third rung of the trunk: after you've installed Claude and set up a project with git, getting your source documents into clean, diffable text is what makes the problem branches efficient. The bearing and tolerance-stack problems run entirely on numbers that, in real life, you'd transcribe off a datasheet or a drawing exactly this way.