I Tracked 100 Million Tokens of Claude Code Usage. 99.4% Were Wasted on Reading.
A developer recently published the results of tracking their Claude Code usage for an entire month. The numbers are staggering:
- 100 million tokens consumed in total
- 99.4% were input tokens — meaning Claude was reading, not writing
- For every 1 token of output, approximately 166 tokens were read
- A read-to-write ratio of 165:1
This isn’t an outlier. Independent measurements across multiple codebases tell the same story. A study tracking 42 executions across FastAPI’s source code (~800 Python files) found that 70% of all tokens were waste — consumed by file reading and navigation, not reasoning or code generation.
Another benchmark found that 76% of tokens were consumed specifically by file read operations. A separate measurement on ripgrep’s codebase showed that a single code investigation consumed 20,580 tokens across 5+ tool calls — and 87% of that could have been eliminated with structural understanding.
Where Your Tokens Actually Go
Here’s the breakdown of a typical Claude Code session on a real codebase:
Context window allocation (200K tokens):
- System prompt + tool schemas: ~20K tokens (10%)
- File reading and navigation: ~120K–160K tokens (60–80%)
- Conversation history: ~10K–20K tokens (5–10%)
- Compaction buffer (reserved, unusable): ~33K tokens (16.5%)
- Actual reasoning and code writing: ~10K–30K tokens (5–15%)
You’re paying for 200K tokens of context. Your AI gets to think with 10K–30K of them. The rest is overhead, file reading, and reserved buffers.
The Cost Math
At Opus 4.6 pricing ($15/M input tokens), let’s calculate:
- One complex query consuming 150K input tokens = $2.25 per query
- A developer running 10 queries per day = $22.50/day
- 50 developers = 25K/month
- Annual token cost: ~$300K — mostly spent on re-reading code
At Sonnet 4.6 pricing ($3/M input tokens), it’s cheaper but the same ratio:
- One complex query: ~$0.45
- 50 developers, 10 queries/day: ~60K/year
And this calculation assumes each query is independent. In practice, context accumulates across turns, so later queries in a session cost far more than earlier ones.
Why It Gets Worse Over Time
The cost per prompt grows exponentially, not linearly, within a session. Here’s why:
- Turn 1: Your question + file reads = ~50K tokens total
- Turn 5: Everything from turns 1–4 + new file reads = ~100K tokens total
- Turn 10: Everything from turns 1–9 + new reads = ~160K tokens total
- Turn 15: Context fills up. Compaction fires. Agent starts over with a lossy summary.
- Turn 16: Agent re-reads files to recover lost details. Context fills again.
By turn 15, each prompt is re-processing the full conversation history plus all accumulated codebase reads. The last message in a session costs 3–5× more than the first message.
The Re-Reading Loop
The most expensive aspect isn’t any single session — it’s the re-reading loop across sessions:
Every morning, every developer opens Claude Code. The agent has zero memory from yesterday. It re-reads the same files it read yesterday. And the day before. And last week.
5 developers on one team = 5× the same re-discovery cost, every day. The same files, the same dependencies, the same architecture — rediscovered from scratch by every developer in every session.
Over a month, a 50-developer team might consume 100 million tokens per developer — and 99.4% of those tokens go to input. Reading. Not writing.
The Structural Fix
The 165:1 read-to-write ratio isn’t a Claude problem — it’s an architecture problem. Every AI coding agent (Cursor, Copilot, Codex) has the same fundamental issue: no persistent map, so it reads everything from scratch every time. ByteBell’s Private Code Context replaces this brute-force reading with a pre-computed code intelligence graph, reducing token consumption by 50–70% and bringing the effective cost per query from 0.04–0.08 — because the AI queries structured metadata instead of re-reading your entire codebase on every prompt. Learn more at bytebell.ai
