Mar 27, 2026 AI context rot LLM degradation AI quality long conversations context window performance AI forgetting context Claude worse over time

Research tested 18 frontier models. Every single one gets worse as input length increases. This phenomenon — context rot — is why your AI coding assistant degrades mid-session.

Photo by Egor Komarov on Unsplash

Context Rot: Why Your AI Gets Worse the Longer You Talk to It

You’ve noticed the pattern. At the start of a session, Claude is sharp — precise file references, correct code, clear reasoning. Thirty minutes later, it’s vague, repetitive, and occasionally wrong. By the end of a long session, it’s suggesting changes to files it already modified, forgetting constraints it established earlier, and generating code that contradicts its own earlier output.

You’re not imagining it. This is a measurable, research-documented phenomenon called context rot.

What Research Shows

Chroma’s 2025 study systematically tested 18 frontier models and found that every single one — not some, not most, ALL of them — gets worse as input length increases.

The degradation isn’t about hitting the context window limit. It happens well before that. A model with a 200K token window can exhibit significant degradation at 50K tokens. A 1M token window shows degradation at 300K–500K tokens.

One study found that agent performance degrades by 40% beyond 50K tokens due to attention dilution. Another found roughly 2% effectiveness loss per 100K tokens added to context.

At the 1M token scale, the degradation is dramatic:

Claude Opus 4.6 retrieval accuracy: 92% at 256K → 78% at 1M
GPT-5.4: 80% at 128K → 37% at 1M
Gemini 3.1 Pro: 59% at 256K → 26% at 1M

Why It Happens: The Attention Budget

Transformer-based models use a mechanism called self-attention. When generating the next word, the model looks back at the entire input and asks: “which parts matter most for what I’m about to generate?”

The catch: the relevance scores must add up to 100%. With 10K tokens, it’s easy to give meaningful attention to the important parts. With 200K tokens, that same 100% spreads thinner. With 1M tokens, each token gets 1/1,000,000th of the attention budget.

At 10K tokens, the model tracks 100 million pairwise relationships. At 1M tokens, it tracks 1 trillion. The signal doesn’t get louder as context grows — the noise floor rises.

A counter-intuitive finding from the research: models actually performed better on shuffled haystacks than on logically coherent documents. Structural coherence in the context appears to create more plausible-seeming distractors that confuse the attention mechanism.

How Context Rot Manifests in Coding

In a typical Claude Code session, context rot shows up as:

Gradually less precise answers: Early in the session, Claude cites exact file paths and line numbers. After 30 minutes, it refers to “the auth file” or “the middleware.”

Contradicting earlier decisions: The agent may suggest an approach that directly conflicts with a decision it made 15 turns ago — because the early decision is now buried in context noise.

Re-reading files it already read: As earlier context gets attention-diluted, the agent may search for files it already has in context — wasting more tokens and filling the window further.

Generating plausible but incorrect code: The agent knows enough to generate something that looks right, but it’s missing specific details (exact variable names, interface types, configuration values) that were clear earlier in the session.

Ignoring stated constraints: If you said “don’t modify the database schema” in turn 3, by turn 20 the agent may propose schema changes — not maliciously, but because that instruction has been diluted by 150K tokens of subsequent context.

The Compaction Trap

The “solution” to context rot — compaction — actually makes things worse in a different way. Compaction reduces the context size (which helps attention), but it’s lossy summarization (which destroys specifics).

You’re trading one problem (attention dilution) for another (information loss). Neither preserves the precise, detailed understanding the agent had at the start of the session.

65% of enterprise AI failures in 2025 were attributed to context drift or memory loss during multi-step reasoning. The Cloud Security Alliance formalized “Cognitive Degradation Resilience” as a distinct property in late 2025 — recognizing that context rot is a category of failure separate from traditional reliability issues.

What You Can Do

Keep sessions short and focused. One task per session. Don’t chain 15 tasks.

State critical constraints at the end of your prompts, not just the beginning. Information at the start and end of context gets more attention than information in the middle.

Monitor context usage. Once past 50%, quality is already declining. Past 70%, it’s measurably degraded.

The Structural Fix

Context rot is proportional to context size. If 60–80% of the context is raw file contents, the rot accelerates because the model’s attention is spread across thousands of lines of code it doesn’t need. ByteBell’s Private Code Context keeps context lean by replacing raw file dumps with pre-computed graph metadata at 3–5% of the token cost — meaning the model operates in the high-accuracy zone (under 50% context utilization) for the entire session, with attention focused on high-signal structured data instead of diluted across raw source files. Learn more at bytebell.ai

← All posts