The “Lost in the Middle” Problem
The Book Report Analogy
Remember writing book reports in school? You’d read the first chapter carefully (it’s the beginning!), skim through the middle, and then read the last chapter carefully (you need to know the ending!). The middle chapters? A blur.
AI models do the same thing — and researchers proved it with hard data.
In 2023, Liu et al. published a landmark paper showing that every major LLM exhibits the same pattern: strong recall at the beginning and end of the context, poor recall in the middle. They called it the “Lost in the Middle” problem, and it’s not a bug that will be patched. It’s a mathematical property of how attention works.
The Experiment
The researchers designed a simple test:
- Give the model 20 documents
- Hide the answer to a question in one specific document
- Vary which document position (1st through 20th) contains the answer
- Measure accuracy at each position
The results formed a clear U-shaped curve:
| Answer Position | Accuracy |
|---|---|
| Position 1 (start) | ~85% |
| Position 5 | ~65% |
| Position 10 (middle) | ~45% |
| Position 15 | ~60% |
| Position 20 (end) | ~80% |
The model found the answer 85% of the time when it was first, but only 45% of the time when it was in the middle. That’s almost a coin flip.
Why This Happens: The Softmax Math
The root cause is the softmax function that normalizes attention weights:
Where is the raw attention score for token at position .
Positional Bias in Attention Scores
In practice, attention scores aren’t purely content-based. Position encodings (like RoPE) add a positional signal:
The positional bias typically decays with distance, but in most architectures, it has a recency bias — recent tokens get a boost. Combined with the initial tokens having been through all attention layers (and thus being highly refined), this creates the U-shaped pattern.
The Entropy Connection
We can measure how “spread out” or “focused” the attention distribution is using entropy:
Maximum entropy (uniform attention):
As context grows, entropy tends toward maximum, meaning attention becomes more uniform — and uniformly diluted.
import numpy as np
def softmax(x):
"""Numerically stable softmax."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()
def entropy(probs):
"""Shannon entropy of a probability distribution."""
probs = probs[probs > 0] # Avoid log(0)
return -np.sum(probs * np.log2(probs))
def simulate_lost_in_middle(n_documents=20, n_trials=1000):
"""
Simulate the 'lost in the middle' effect.
Model attention with positional bias: tokens at the start
and end receive higher base scores.
"""
positions = np.arange(n_documents)
results = np.zeros(n_documents)
for trial in range(n_trials):
for answer_pos in range(n_documents):
# Generate random relevance scores
scores = np.random.randn(n_documents) * 0.5
# The answer document has higher relevance
scores[answer_pos] += 3.0
# Add U-shaped positional bias
# Higher at start (primacy) and end (recency)
normalized_pos = positions / (n_documents - 1) # 0 to 1
positional_bias = 1.5 * (2 * (normalized_pos - 0.5) ** 2)
scores += positional_bias
# Compute attention weights
attn = softmax(scores)
# "Found" if the answer position has highest attention
if np.argmax(attn) == answer_pos:
results[answer_pos] += 1
accuracy = results / n_trials * 100
return accuracy
accuracy = simulate_lost_in_middle()
print("Position Accuracy")
print("-" * 25)
for i, acc in enumerate(accuracy):
bar = "█" * int(acc / 2)
print(f" {i+1:>2} {acc:5.1f}% {bar}")The Impact at Scale
The lost-in-the-middle effect gets worse with longer contexts:
At 4K context (20 documents × 200 tokens each):
- Middle position accuracy: ~55%
- The U-shape is mild
At 128K context (hundreds of documents):
- Middle position accuracy: ~30%
- The U-shape is severe
At 1M context (thousands of documents):
- Middle position accuracy: ~15–20%
- The model essentially can’t find information in the middle
The mathematical reason is that as grows, the softmax denominator grows, and the positional bias signal becomes relatively stronger compared to the content relevance signal:
Practical Implications for Prompt Design
Rule 1: Put Critical Information First
System prompts, instructions, and key constraints should go at the very beginning of your context:
✅ GOOD:
[Critical instructions]
[Context document 1]
[Context document 2]
[User question]
❌ BAD:
[Context document 1]
[Context document 2]
[Critical instructions] ← Lost in the middle!
[More context]
[User question]Rule 2: Repeat Key Instructions at the End
For long contexts, repeat your most important instructions near the end:
[System prompt with instructions]
[Long context: documents, code, conversation history]
[Reminder: "Remember to follow the formatting rules from the system prompt"]
[User question]Rule 3: Use RAG Instead of Context Stuffing
Instead of putting 50 documents into context and hoping the model finds the right one, use retrieval to select the 3–5 most relevant documents:
def smart_context_construction(
question: str,
documents: list[str],
retriever,
max_docs: int = 5
) -> str:
"""
Use retrieval to select relevant documents
instead of stuffing everything in context.
"""
# Retrieve only relevant documents
relevant_docs = retriever.search(question, top_k=max_docs)
# Place most relevant at start and end (U-shaped optimization)
if len(relevant_docs) >= 3:
# Most relevant first, second-most relevant last
reordered = [relevant_docs[0]]
reordered.extend(relevant_docs[2:])
reordered.append(relevant_docs[1])
else:
reordered = relevant_docs
context = "\n\n".join([
f"Document {i+1}:\n{doc}"
for i, doc in enumerate(reordered)
])
return contextRule 4: Structure Long Documents with Headers
Clear headers and section markers help the model’s attention “anchor” to structure:
## Section 1: API Authentication
[content about auth]
## Section 2: Rate Limits
[content about rate limits]
## Section 3: Error Handling ← Even in the middle,
[content about errors] headers help the model
locate this sectionComparison Across Models
The lost-in-the-middle effect is universal but varies in severity:
| Model | Start Accuracy | Middle Accuracy | End Accuracy |
|---|---|---|---|
| GPT-4 | 87% | 52% | 83% |
| Claude 3 | 89% | 58% | 85% |
| Llama 3 70B | 82% | 43% | 78% |
| Gemini 1.5 | 85% | 55% | 82% |
No model is immune. The effect is a structural property of softmax attention, not a training limitation. Better training can reduce the severity but cannot eliminate it.
The Fundamental Insight
The lost-in-the-middle problem reveals a deep truth about AI context management: having information available is not the same as being able to use it effectively.
A model with 1M tokens of context capacity and your answer buried in the middle might perform worse than a model with 4K tokens of context that contains only the relevant information.
This is why the future of AI isn’t just bigger context windows — it’s smarter context selection.
ByteBell helps engineering teams solve exactly this problem. Instead of stuffing everything into the context window, ByteBell’s Private Code Context retrieves only what matters — keeping your AI sharp, fast, and accurate. Learn more at bytebell.ai