Mar 6, 2026 Claude Code auto compact context window limit AI memory loss context management Claude Code compaction LLM context window

When Claude Code shows 'context left until auto-compact: 0%', it's about to summarize everything and throw away the details. Here's exactly what gets lost and why it matters.

Photo by Unsplash

“Context Left Until Auto-Compact: 0%” — What This Warning Means and Why Your AI Just Forgot Everything

You’re 45 minutes into debugging a payment webhook. Claude has the stack trace, the relevant files, the narrowed hypothesis. Then you see the warning:

“Context left until auto-compact: 0%”

Seconds later, Claude summarizes everything, forgets which files it modified, and starts re-reading code it already analyzed. Your debugging session just got reset to zero.

If you’ve experienced this, you know the feeling. The agent was on a roll — precise, fast, hitting the right files. After compaction, it becomes vague, refers to things in general terms, and asks to re-read files it spent 20 minutes analyzing.

This article explains exactly what happens during auto-compaction, what information gets destroyed, and why the death spiral that follows makes it progressively worse.

What Triggers Auto-Compaction

Claude Code monitors your context window continuously. The context window is the total amount of information the model can hold in active memory at any time — currently 200K tokens for most plans, or up to 1M tokens on the latest models.

Auto-compaction fires when usage hits approximately 83% of the window — roughly 167,000 tokens on a 200K window. This threshold was raised from ~77% in mid-2025, giving users about 12,000 additional tokens per cycle.

Everything in the context window contributes to this limit: your messages, Claude’s responses, every file that was read, every grep output, every command result, every tool schema, and the system prompt. It all accumulates. Nothing evaporates between turns.

Why the Window Fills So Fast

The typical session fills the context window in this order:

Before you type anything (~20K–40K tokens consumed):

System prompt and tool schemas: ~16K–20K tokens
CLAUDE.md / project instructions: ~1K–8K tokens
MCP server definitions: ~2K–5K per server

Your first question adds ~200 tokens. Trivial.

Claude’s investigation adds 40K–100K tokens:

Grep searches: ~3K–5K tokens of output
File reads (5–15 files): ~15K–60K tokens
Command outputs: ~1K–5K tokens
Claude’s reasoning and response: ~2K–5K tokens

Your follow-up question and Claude’s second investigation: Everything from round 1 is still in context. Round 2 stacks on top. By turn 10–15, the window is full.

On a large codebase with many files, the window can fill in 15–20 minutes.

What Happens During Compaction

When compaction fires, Claude performs the following:

Reads the entire conversation history (costing ~2K tokens for the summarization call itself)
Generates a structured summary organized into: completed work, current state, pending tasks, file modifications, and key decisions. This summary is typically 7,000–12,000 characters (~2K–4K tokens).
Discards everything else. All file contents that were read. All grep outputs. All command results. All intermediate reasoning. All error messages. Gone.
Restarts the conversation with only the summary as context.

The compression ratio is brutal: 167,000 tokens compressed into 2,000–4,000 tokens. That’s a 40:1 to 80:1 compression ratio.

What Gets Destroyed (Specific Examples)

File paths and line numbers:

Before compaction: Error at src/api/webhooks/stripe.ts:98 — TypeError: Cannot read property 'retryCount' of undefined

After compaction: “Found bug in Stripe webhook retry logic”

The agent no longer knows the file path, the line number, or the exact error.

Exact error messages and test results:

Before: FAIL test/webhooks.test.ts:156 — Expected: { status: 'retry', count: 3 } — Received: TypeError

After: “Test at webhooks test file is failing”

Debugging hypothesis chains:

Before: “Hypothesis 1: retryCount undefined because customer is new → CONFIRMED. Hypothesis 2: metadata field missing → REJECTED. Hypothesis 3: Schema migration missed default → INVESTIGATING. Checked migration at db/migrations/20240115_add_retry.sql”

After: “Investigated retry count bug. Found it’s related to missing defaults.”

Multi-file relationship maps:

Before: A detailed trace through 5 files showing exact request flow with line numbers

After: “Traced auth flow through gateway, auth service, and billing service”

The Death Spiral

After compaction, the agent needs to continue working. But it no longer has the specific details. So it re-reads the files it already read — consuming tokens, filling the context window again. Within 15–20 minutes, compaction fires a second time.

Compaction 1: Full conversation → summary. General picture preserved, specifics lost.

Compaction 2: Summary + new work → summary of a summary. The agent is now working from a summary that was already missing details, and that summary gets compressed again.

Compaction 3: Summary of summary + new work → triple-compressed summary. By now, early decisions, exact file paths, and specific error messages from the beginning of the session are essentially gone.

Compaction 4+: The agent is generating code based on fragments of fragments. It may contradict its own earlier decisions, re-introduce bugs it already fixed, or attempt edits in locations that no longer match reality.

Research confirms this: quality degradation begins at just 70% context utilization — well before compaction fires. After 3–4 compaction cycles, critical context is permanently lost. 65% of enterprise AI failures in 2025 were attributed to this exact pattern of context drift and memory loss.

What You Can Do

Run /compact manually with instructions before auto-compact fires: /compact preserve the file paths I modified, current test failures, and the auth refactoring plan. This produces a better summary than the automatic one.

Commit and clear at logical stopping points. Finish a subtask, commit your changes, run /clear to start a fresh session.

Start new sessions for new tasks. Don’t chain tasks in one conversation.

The Structural Fix

Manual compaction is a band-aid on a fundamentally broken loop: read files → fill context → compact → lose info → re-read → fill → compact. The problem isn’t that compaction is bad at summarizing — it’s that 60–80% of your context window is filled with raw file contents that shouldn’t be there in the first place. ByteBell’s Private Code Context eliminates the compaction death spiral entirely by replacing brute-force file reading with pre-computed graph metadata that uses 3–5% of the context window — keeping your agent well below the compaction threshold for the entire session, with zero information loss, and the graph persists between sessions so your AI never re-reads your codebase from scratch again. Learn more at bytebell.ai

← All posts