Mar 31, 2026 context window AI basics LLM fundamentals beginners guide

The context window is your AI's short-term memory. It's not storage — it's a fixed-size desk. Everything the AI is thinking about right now must fit on this desk.

What Is a Context Window? The Simple Explanation Everyone Gets Wrong

Let’s Start Simple: The Desk Analogy

Imagine you’re sitting at a desk during an exam. Your desk has a fixed size — say, 3 feet by 2 feet. You can only spread out as many papers, notes, and reference sheets as physically fit on that desk. If you need to look at a new document but the desk is full, something has to come off.

That desk is the context window.

Every AI model — ChatGPT, Claude, Gemini — has a fixed-size desk. Everything the AI is “thinking about” at any moment must fit on that desk. Your messages, the AI’s responses, any files you’ve uploaded, any instructions it’s been given — all of it shares this one desk.

When people say a model has a “200K context window,” they mean the desk can hold 200,000 tokens at once. When they say “1M context window,” the desk holds 1,000,000 tokens.

But here’s what most people get wrong: a bigger desk doesn’t mean the AI writes better. It just means more stuff can sit on the desk at once.

What Is a Token?

A token is the basic unit of text that AI models read. It’s not quite a word, not quite a character — it’s somewhere in between.

Rule of thumb: 1 token ≈ 4 characters ≈ ¾ of a word.

Here are some examples:

Text	Token Count
”Hello”	1 token
”Good morning”	2 tokens
”Unconstitutional”	4 tokens
”https://www.example.com”	7 tokens
A typical English sentence	15–25 tokens

The Math

Let’s do the arithmetic for different context window sizes:

$\text{Characters} = \text{Tokens} \times 4$

$\text{Words} \approx \text{Tokens} \times 0.75$

For a 200K token context window:

$200{,}000 \times 4 = 800{,}000 \text{ characters}$

$200{,}000 \times 0.75 = 150{,}000 \text{ words}$

That’s roughly 2 full novels worth of text (the average novel is about 75,000 words).

For a 1M token context window:

$1{,}000{,}000 \times 0.75 = 750{,}000 \text{ words} \approx 10 \text{ novels}$

Sounds enormous, right? Here’s the catch.

”Fits in Memory” ≠ “Can Use Effectively”

Just because 10 novels fit on the desk doesn’t mean the AI can read and understand all 10 equally well.

Research has shown that AI models have a U-shaped attention curve. They pay the most attention to:

The beginning of the context (your first messages, system instructions)
The end of the context (your most recent messages)

Everything in the middle? It gets less attention. The more tokens you stuff in, the worse this middle-blindness gets.

Think of it this way: if your desk has 3 papers, you can glance at all 3 easily. If your desk has 300 papers stacked up, good luck finding the one you need from the middle of the stack.

The 60–70% Rule

A practical guideline used by AI engineers:

$\text{Effective capacity} \approx 0.6 \times \text{Total context window}$

For a 200K context window:

$\text{Effective capacity} \approx 0.6 \times 200{,}000 = 120{,}000 \text{ tokens}$

Beyond ~120K tokens, the model’s quality starts to degrade noticeably. It misses things. It contradicts earlier statements. It “forgets” your instructions.

A Quick Code Example

Want to see how many tokens your text uses? Here’s a simple Python script:

import tiktoken

# Use the cl100k_base encoding (used by GPT-4, similar to Claude's tokenizer)
encoder = tiktoken.get_encoding("cl100k_base")

# Some example texts
texts = [
    "Hello",
    "Good morning, how are you?",
    "Unconstitutional",
    "The quick brown fox jumps over the lazy dog.",
]

for text in texts:
    tokens = encoder.encode(text)
    print(f"'{text}' → {len(tokens)} tokens")
    print(f"  Characters: {len(text)}, Ratio: {len(text)/len(tokens):.1f} chars/token")
    print()

Output:

'Hello' → 1 tokens
  Characters: 5, Ratio: 5.0 chars/token

'Good morning, how are you?' → 6 tokens
  Characters: 26, Ratio: 4.3 chars/token

'Unconstitutional' → 4 tokens
  Characters: 16, Ratio: 4.0 chars/token

'The quick brown fox jumps over the lazy dog.' → 10 tokens
  Characters: 44, Ratio: 4.4 chars/token

Notice how the ratio hovers around 4 characters per token — exactly our rule of thumb.

Context Window Is NOT Storage

This is the single biggest misconception. People think the context window is like a hard drive — a place where information is permanently stored. It’s not.

The context window is more like RAM (Random Access Memory) on your computer:

Property	Hard Drive (Storage)	RAM (Context Window)
Persistence	Keeps data when power is off	Erased when session ends
Size	Terabytes	Limited (KB to MB)
Speed	Slower	Fast
Purpose	Long-term storage	Active working memory

When your AI conversation ends, the context window is wiped clean. Next conversation? Fresh desk. No memory of what happened before.

How the Context Window Fills Up

Here’s what actually goes into the context window during a conversation:

┌─────────────────────────────────────────┐
│           CONTEXT WINDOW (200K)          │
│                                          │
│  System Prompt:           ~2,000 tokens  │
│  Tool Definitions:        ~5,000 tokens  │
│  Your Message #1:           ~500 tokens  │
│  AI Response #1:          ~1,000 tokens  │
│  Your Message #2:           ~300 tokens  │
│  AI Response #2:          ~2,000 tokens  │
│  Uploaded PDF:           ~30,000 tokens  │
│  Your Message #3:           ~200 tokens  │
│  ...                                     │
│                                          │
│  USED: ~41,000 / 200,000 tokens         │
│  REMAINING: ~159,000 tokens              │
└─────────────────────────────────────────┘

Notice how the uploaded PDF consumed 30,000 tokens — 15% of the entire window — in one shot.

The Key Numbers to Remember

Context Size	Words	Equivalent	Models
4K tokens	3,000	6 pages	Older GPT-3.5
32K tokens	24,000	48 pages	GPT-4 (early)
128K tokens	96,000	1 novel	GPT-4 Turbo
200K tokens	150,000	2 novels	Claude 3.5
1M tokens	750,000	10 novels	Gemini 1.5 Pro
10M tokens	7,500,000	100 novels	Future models

What This Means for You

Be concise. Every word you send consumes tokens. Don’t paste your entire codebase when you need help with one function.
Put important stuff first or last. Due to the U-shaped attention curve, critical instructions belong at the beginning or end of your prompt — not buried in the middle.
Don’t trust the number on the box. A “200K context window” doesn’t mean the AI can effectively use 200K tokens. Real effective capacity is 60–70% of that.
Longer conversations get worse. By turn 20 of a conversation, the AI has accumulated so many tokens that quality starts to slip. Start fresh for new topics.
Files eat tokens fast. A 50-page PDF can consume 30,000–50,000 tokens. Upload strategically.

The context window is the single most important constraint in AI systems today. Understanding it — really understanding it, beyond the marketing numbers — is the first step to using AI effectively.

ByteBell helps engineering teams solve exactly this problem. Instead of stuffing everything into the context window, ByteBell’s Private Code Context retrieves only what matters — keeping your AI sharp, fast, and accurate. Learn more at bytebell.ai

← All posts