A 20-turn conversation costs quadratically more than you'd expect. Here are the exact formulas for multi-turn conversation costs across providers.
Imagine a taxi where the meter doesn’t just measure distance — it measures total distance traveled since the trip started. Each block costs more than the last because the meter includes all previous blocks. By block 20, you’re paying for the cumulative distance of all 20 blocks, not just the 20th one.
This is exactly how multi-turn AI conversations work. Each turn carries all previous context, making later turns exponentially more expensive.
Where:
Simple enough. But in a multi-turn conversation, is not just your messages — it’s the entire conversation history.
In a conversation where each turn adds tokens (user message + AI response):
| Turn | Input Tokens | Cumulative |
|---|---|---|
| 1 | ||
| 2 | ||
| 3 | ||
| … | … | … |
Total input tokens across turns:
This is quadratic in ! A 20-turn conversation doesn’t cost 20× a single turn — it costs:
def multi_turn_cost(
n_turns: int,
tokens_per_user_msg: int = 200,
tokens_per_ai_response: int = 800,
system_prompt_tokens: int = 2000,
price_input: float = 3.0, # $ per M tokens
price_output: float = 15.0, # $ per M tokens
) -> dict:
"""
Calculate the total cost of a multi-turn conversation.
Each turn's input includes:
- System prompt (constant)
- All previous messages (grows linearly)
- Current user message
"""
tokens_per_turn = tokens_per_user_msg + tokens_per_ai_response
total_input = 0
total_output = 0
for turn in range(1, n_turns + 1):
# Input: system prompt + all previous turns + current message
input_tokens = system_prompt_tokens + (turn - 1) * tokens_per_turn + tokens_per_user_msg
output_tokens = tokens_per_ai_response
total_input += input_tokens
total_output += output_tokens
input_cost = total_input / 1e6 * price_input
output_cost = total_output / 1e6 * price_output
return {
"turns": n_turns,
"total_input_tokens": total_input,
"total_output_tokens": total_output,
"input_cost": input_cost,
"output_cost": output_cost,
"total_cost": input_cost + output_cost,
"cost_per_turn": (input_cost + output_cost) / n_turns,
"naive_expectation": n_turns * (
(system_prompt_tokens + tokens_per_user_msg) / 1e6 * price_input +
tokens_per_ai_response / 1e6 * price_output
),
}
# Show cost scaling
print(f"{'Turns':>6} {'Input Tokens':>14} {'Total Cost':>12} "
f"{'Naive Expected':>15} {'Actual/Naive':>12}")
print("=" * 65)
for n in [1, 5, 10, 20, 50, 100]:
result = multi_turn_cost(n)
ratio = result["total_cost"] / result["naive_expectation"]
print(f"{n:>6} {result['total_input_tokens']:>14,} "
f"${result['total_cost']:>11.4f} "
f"${result['naive_expectation']:>14.4f} {ratio:>11.1f}×")Output:
Turns Input Tokens Total Cost Naive Expected Actual/Naive
=================================================================
1 2,200 $0.0186 $0.0186 1.0×
5 15,000 $0.1050 $0.0930 1.1×
10 54,000 $0.2820 $0.1860 1.5×
20 212,000 $0.8760 $0.3720 2.4×
50 1,282,000 $4.7460 $0.9300 5.1×
100 5,060,000 $17.1800 $1.8600 9.2×A 100-turn conversation costs 9.2× more than you’d naively expect!
Current API pricing (approximate, 2026):
| Provider | Model | Input ($/M) | Output ($/M) |
|---|---|---|---|
| Anthropic | Claude Sonnet | $3.00 | $15.00 |
| Anthropic | Claude Haiku | $0.25 | $1.25 |
| OpenAI | GPT-4o | $2.50 | $10.00 |
| OpenAI | GPT-4o-mini | $0.15 | $0.60 |
| Gemini 1.5 Pro | $1.25 | $5.00 |
providers = [
("Claude Sonnet", 3.00, 15.00),
("Claude Haiku", 0.25, 1.25),
("GPT-4o", 2.50, 10.00),
("GPT-4o-mini", 0.15, 0.60),
("Gemini 1.5 Pro", 1.25, 5.00),
]
# Cost for a 20-turn conversation
print(f"\n20-Turn Conversation Cost Comparison:")
print(f"{'Model':<20} {'Input Cost':>12} {'Output Cost':>12} {'Total':>12}")
print("=" * 60)
for name, p_in, p_out in providers:
result = multi_turn_cost(20, price_input=p_in, price_output=p_out)
print(f"{name:<20} ${result['input_cost']:>11.4f} "
f"${result['output_cost']:>11.4f} ${result['total_cost']:>11.4f}")Cache the system prompt and early conversation to avoid re-processing:
With Anthropic’s caching (reads at 0.1× base price):
def cost_with_caching(
n_turns: int,
cached_prefix_tokens: int = 5000, # System prompt + tools
**kwargs
) -> dict:
"""Cost with prompt caching (90% savings on cached prefix)."""
result = multi_turn_cost(n_turns, **kwargs)
# Savings: cached tokens are read at 10% price on all but first turn
cache_savings = (n_turns - 1) * cached_prefix_tokens / 1e6 * kwargs.get("price_input", 3.0) * 0.9
result["total_cost_cached"] = result["total_cost"] - cache_savings
result["cache_savings"] = cache_savings
return resultOnly keep the last turns instead of the full history:
This caps input tokens at a constant regardless of conversation length.
Periodically summarize older turns into a compact summary:
Typical compression: 10:1 to 20:1, reducing accumulated context significantly.
Let’s trace a realistic 20-turn coding conversation:
def detailed_conversation_trace():
"""Detailed cost trace of a 20-turn coding conversation."""
system_prompt = 3000 # tokens
turns = [
("User asks about a bug", 150, 500),
("User shares code", 800, 600),
("User asks for explanation", 100, 1200),
("User requests changes", 200, 800),
("User reports error", 300, 400),
("User shares error log", 500, 600),
("User asks follow-up", 100, 800),
("User requests refactor", 150, 1500),
("User asks about testing", 100, 1000),
("User shares test results", 400, 600),
("User asks for optimization", 100, 1200),
("User shares benchmark", 300, 500),
("User requests documentation", 100, 800),
("User asks about deployment", 150, 600),
("User shares config", 600, 400),
("User reports new issue", 200, 800),
("User asks for review", 100, 1500),
("User requests final changes", 150, 600),
("User asks about CI/CD", 100, 800),
("User says thanks", 50, 200),
]
cumulative = system_prompt
total_input = 0
total_output = 0
price_in = 3.0 / 1e6
price_out = 15.0 / 1e6
print(f"{'Turn':>4} {'Description':<30} {'Input':>8} {'Output':>8} "
f"{'Turn $':>8} {'Running $':>10}")
print("=" * 75)
running_cost = 0
for i, (desc, user_tok, ai_tok) in enumerate(turns, 1):
input_tokens = cumulative + user_tok
turn_cost = input_tokens * price_in + ai_tok * price_out
running_cost += turn_cost
total_input += input_tokens
total_output += ai_tok
cumulative += user_tok + ai_tok
print(f"{i:>4} {desc:<30} {input_tokens:>8,} {ai_tok:>8} "
f"${turn_cost:>7.4f} ${running_cost:>9.4f}")
print("=" * 75)
print(f"{'':>4} {'TOTAL':<30} {total_input:>8,} {total_output:>8} "
f"{'':>8} ${running_cost:>9.4f}")
detailed_conversation_trace()The total cost of this realistic 20-turn conversation is typically $2-5 — much more than most users expect from “just chatting.”
Multi-turn costs grow quadratically. Turn processes all previous context, so total cost scales as .
The cost you feel ≠ the cost you pay. A 20-turn conversation feels like 20 simple queries but costs 2-3× more due to accumulation.
Prompt caching is essential. For production systems with repeated prefixes, caching saves 50-90% on input costs.
Strategic truncation saves money. Keeping only the last 5-10 turns instead of the full history caps costs at a constant.
Know your provider’s pricing. The difference between Sonnet and Haiku can be 10×, and between Sonnet and GPT-4o-mini can be 20×.
ByteBell helps engineering teams solve exactly this problem. Instead of stuffing everything into the context window, ByteBell’s Smart Context Refresh retrieves only what matters — keeping your AI sharp, fast, and accurate. Learn more at bytebell.ai