I Tracked 100 Million Tokens of Claude Code Usage. 99.4% Were Wasted on Reading.

Mar 12, 2026 Claude Code token usage AI coding cost token waste input vs output tokens Claude Code expensive AI coding agent efficiency

A developer tracked every token Claude Code consumed for a month. The result: 99.4% were input tokens. For every 1 token written, 166 were consumed reading. Here's what that means for your bill.

Photo by Alexandar Todov on Unsplash

I Tracked 100 Million Tokens of Claude Code Usage. 99.4% Were Wasted on Reading.

A developer recently published the results of tracking their Claude Code usage for an entire month. The numbers are staggering:

100 million tokens consumed in total
99.4% were input tokens — meaning Claude was reading, not writing
For every 1 token of output, approximately 166 tokens were read
A read-to-write ratio of 165:1

This isn’t an outlier. Independent measurements across multiple codebases tell the same story. A study tracking 42 executions across FastAPI’s source code (~800 Python files) found that 70% of all tokens were waste — consumed by file reading and navigation, not reasoning or code generation.

Another benchmark found that 76% of tokens were consumed specifically by file read operations. A separate measurement on ripgrep’s codebase showed that a single code investigation consumed 20,580 tokens across 5+ tool calls — and 87% of that could have been eliminated with structural understanding.

Where Your Tokens Actually Go

Here’s the breakdown of a typical Claude Code session on a real codebase:

Context window allocation (200K tokens):

System prompt + tool schemas: ~20K tokens (10%)
File reading and navigation: ~120K–160K tokens (60–80%)
Conversation history: ~10K–20K tokens (5–10%)
Compaction buffer (reserved, unusable): ~33K tokens (16.5%)
Actual reasoning and code writing: ~10K–30K tokens (5–15%)

You’re paying for 200K tokens of context. Your AI gets to think with 10K–30K of them. The rest is overhead, file reading, and reserved buffers.

The Cost Math

At Opus 4.6 pricing ($15/M input tokens), let’s calculate:

One complex query consuming 150K input tokens = $2.25 per query
A developer running 10 queries per day = $22.50/day
50 developers = $1,125/day = ~$ 25K/month
Annual token cost: ~$300K — mostly spent on re-reading code

At Sonnet 4.6 pricing ($3/M input tokens), it’s cheaper but the same ratio:

One complex query: ~$0.45
50 developers, 10 queries/day: ~ $5K/month = ~$ 60K/year

And this calculation assumes each query is independent. In practice, context accumulates across turns, so later queries in a session cost far more than earlier ones.

Why It Gets Worse Over Time

The cost per prompt grows exponentially, not linearly, within a session. Here’s why:

Turn 1: Your question + file reads = ~50K tokens total
Turn 5: Everything from turns 1–4 + new file reads = ~100K tokens total
Turn 10: Everything from turns 1–9 + new reads = ~160K tokens total
Turn 15: Context fills up. Compaction fires. Agent starts over with a lossy summary.
Turn 16: Agent re-reads files to recover lost details. Context fills again.

By turn 15, each prompt is re-processing the full conversation history plus all accumulated codebase reads. The last message in a session costs 3–5× more than the first message.

The Re-Reading Loop

The most expensive aspect isn’t any single session — it’s the re-reading loop across sessions:

Every morning, every developer opens Claude Code. The agent has zero memory from yesterday. It re-reads the same files it read yesterday. And the day before. And last week.

5 developers on one team = 5× the same re-discovery cost, every day. The same files, the same dependencies, the same architecture — rediscovered from scratch by every developer in every session.

Over a month, a 50-developer team might consume 100 million tokens per developer — and 99.4% of those tokens go to input. Reading. Not writing.

The Structural Fix

The 165:1 read-to-write ratio isn’t a Claude problem — it’s an architecture problem. Every AI coding agent (Cursor, Copilot, Codex) has the same fundamental issue: no persistent map, so it reads everything from scratch every time. ByteBell’s Smart Context Refresh replaces this brute-force reading with a pre-computed code intelligence graph, reducing token consumption by 50–70% and bringing the effective cost per query from $2–30 down to$ 0.04–0.08 — because the AI queries structured metadata instead of re-reading your entire codebase on every prompt. Learn more at bytebell.ai