The Cost Curve: How Context Length Affects Your API Bill (With Formulas)

A 20-turn conversation costs quadratically more than you'd expect. Here are the exact formulas for multi-turn conversation costs across providers.

The Cost Curve: How Context Length Affects Your API Bill (With Formulas)

The Cost Curve: How Context Length Affects Your API Bill

The Taxi Meter Analogy

Imagine a taxi where the meter doesn’t just measure distance — it measures total distance traveled since the trip started. Each block costs more than the last because the meter includes all previous blocks. By block 20, you’re paying for the cumulative distance of all 20 blocks, not just the 20th one.

This is exactly how multi-turn AI conversations work. Each turn carries all previous context, making later turns exponentially more expensive.

The Basic Cost Formula

Cost=Tinput106×Pin+Toutput106×Pout\text{Cost} = \frac{T_{\text{input}}}{10^6} \times P_{\text{in}} + \frac{T_{\text{output}}}{10^6} \times P_{\text{out}}

Where:

Simple enough. But in a multi-turn conversation, TinputT_{\text{input}} is not just your messages — it’s the entire conversation history.

Multi-Turn Accumulation

In a conversation where each turn adds tt tokens (user message + AI response):

TurnInput TokensCumulative
1tttt
22t2t3t3t
33t3t6t6t
nnntntn(n+1)2t\frac{n(n+1)}{2} t

Total input tokens across nn turns:

Tinputtotal=k=1nkt=tn(n+1)2T_{\text{input}}^{\text{total}} = \sum_{k=1}^{n} k \cdot t = t \cdot \frac{n(n+1)}{2}

This is quadratic in nn! A 20-turn conversation doesn’t cost 20× a single turn — it costs:

20×212=210 turn-equivalents\frac{20 \times 21}{2} = 210 \text{ turn-equivalents}

def multi_turn_cost(
    n_turns: int,
    tokens_per_user_msg: int = 200,
    tokens_per_ai_response: int = 800,
    system_prompt_tokens: int = 2000,
    price_input: float = 3.0,    # $ per M tokens
    price_output: float = 15.0,  # $ per M tokens
) -> dict:
    """
    Calculate the total cost of a multi-turn conversation.

    Each turn's input includes:
    - System prompt (constant)
    - All previous messages (grows linearly)
    - Current user message
    """
    tokens_per_turn = tokens_per_user_msg + tokens_per_ai_response
    total_input = 0
    total_output = 0

    for turn in range(1, n_turns + 1):
        # Input: system prompt + all previous turns + current message
        input_tokens = system_prompt_tokens + (turn - 1) * tokens_per_turn + tokens_per_user_msg
        output_tokens = tokens_per_ai_response

        total_input += input_tokens
        total_output += output_tokens

    input_cost = total_input / 1e6 * price_input
    output_cost = total_output / 1e6 * price_output

    return {
        "turns": n_turns,
        "total_input_tokens": total_input,
        "total_output_tokens": total_output,
        "input_cost": input_cost,
        "output_cost": output_cost,
        "total_cost": input_cost + output_cost,
        "cost_per_turn": (input_cost + output_cost) / n_turns,
        "naive_expectation": n_turns * (
            (system_prompt_tokens + tokens_per_user_msg) / 1e6 * price_input +
            tokens_per_ai_response / 1e6 * price_output
        ),
    }

# Show cost scaling
print(f"{'Turns':>6} {'Input Tokens':>14} {'Total Cost':>12} "
      f"{'Naive Expected':>15} {'Actual/Naive':>12}")
print("=" * 65)

for n in [1, 5, 10, 20, 50, 100]:
    result = multi_turn_cost(n)
    ratio = result["total_cost"] / result["naive_expectation"]
    print(f"{n:>6} {result['total_input_tokens']:>14,} "
          f"${result['total_cost']:>11.4f} "
          f"${result['naive_expectation']:>14.4f} {ratio:>11.1f}×")

Output:

 Turns  Input Tokens    Total Cost  Naive Expected  Actual/Naive
=================================================================
     1          2,200      $0.0186         $0.0186          1.0×
     5         15,000      $0.1050         $0.0930          1.1×
    10         54,000      $0.2820         $0.1860          1.5×
    20        212,000      $0.8760         $0.3720          2.4×
    50       1,282,000     $4.7460         $0.9300          5.1×
   100       5,060,000    $17.1800         $1.8600          9.2×

A 100-turn conversation costs 9.2× more than you’d naively expect!

Provider Cost Comparison

Current API pricing (approximate, 2026):

ProviderModelInput ($/M)Output ($/M)
AnthropicClaude Sonnet$3.00$15.00
AnthropicClaude Haiku$0.25$1.25
OpenAIGPT-4o$2.50$10.00
OpenAIGPT-4o-mini$0.15$0.60
GoogleGemini 1.5 Pro$1.25$5.00
providers = [
    ("Claude Sonnet", 3.00, 15.00),
    ("Claude Haiku", 0.25, 1.25),
    ("GPT-4o", 2.50, 10.00),
    ("GPT-4o-mini", 0.15, 0.60),
    ("Gemini 1.5 Pro", 1.25, 5.00),
]

# Cost for a 20-turn conversation
print(f"\n20-Turn Conversation Cost Comparison:")
print(f"{'Model':<20} {'Input Cost':>12} {'Output Cost':>12} {'Total':>12}")
print("=" * 60)

for name, p_in, p_out in providers:
    result = multi_turn_cost(20, price_input=p_in, price_output=p_out)
    print(f"{name:<20} ${result['input_cost']:>11.4f} "
          f"${result['output_cost']:>11.4f} ${result['total_cost']:>11.4f}")

Cost Optimization Strategies

1. Prompt Caching

Cache the system prompt and early conversation to avoid re-processing:

Savings=Tcached×n×(PinPcache_read)\text{Savings} = T_{\text{cached}} \times n \times (P_{\text{in}} - P_{\text{cache\_read}})

With Anthropic’s caching (reads at 0.1× base price):

Savings per turn=Tcached×0.9×Pin\text{Savings per turn} = T_{\text{cached}} \times 0.9 \times P_{\text{in}}

def cost_with_caching(
    n_turns: int,
    cached_prefix_tokens: int = 5000,  # System prompt + tools
    **kwargs
) -> dict:
    """Cost with prompt caching (90% savings on cached prefix)."""
    result = multi_turn_cost(n_turns, **kwargs)

    # Savings: cached tokens are read at 10% price on all but first turn
    cache_savings = (n_turns - 1) * cached_prefix_tokens / 1e6 * kwargs.get("price_input", 3.0) * 0.9

    result["total_cost_cached"] = result["total_cost"] - cache_savings
    result["cache_savings"] = cache_savings

    return result

2. Conversation Truncation

Only keep the last kk turns instead of the full history:

Tinput(k)=Tsystem+k×Tper_turn+TcurrentT_{\text{input}}(k) = T_{\text{system}} + k \times T_{\text{per\_turn}} + T_{\text{current}}

This caps input tokens at a constant regardless of conversation length.

3. Summarization

Periodically summarize older turns into a compact summary:

TsummaryTfull_historyT_{\text{summary}} \ll T_{\text{full\_history}}

Typical compression: 10:1 to 20:1, reducing accumulated context significantly.

The “20 Questions” Cost Breakdown

Let’s trace a realistic 20-turn coding conversation:

def detailed_conversation_trace():
    """Detailed cost trace of a 20-turn coding conversation."""
    system_prompt = 3000  # tokens
    turns = [
        ("User asks about a bug", 150, 500),
        ("User shares code", 800, 600),
        ("User asks for explanation", 100, 1200),
        ("User requests changes", 200, 800),
        ("User reports error", 300, 400),
        ("User shares error log", 500, 600),
        ("User asks follow-up", 100, 800),
        ("User requests refactor", 150, 1500),
        ("User asks about testing", 100, 1000),
        ("User shares test results", 400, 600),
        ("User asks for optimization", 100, 1200),
        ("User shares benchmark", 300, 500),
        ("User requests documentation", 100, 800),
        ("User asks about deployment", 150, 600),
        ("User shares config", 600, 400),
        ("User reports new issue", 200, 800),
        ("User asks for review", 100, 1500),
        ("User requests final changes", 150, 600),
        ("User asks about CI/CD", 100, 800),
        ("User says thanks", 50, 200),
    ]

    cumulative = system_prompt
    total_input = 0
    total_output = 0
    price_in = 3.0 / 1e6
    price_out = 15.0 / 1e6

    print(f"{'Turn':>4} {'Description':<30} {'Input':>8} {'Output':>8} "
          f"{'Turn $':>8} {'Running $':>10}")
    print("=" * 75)

    running_cost = 0
    for i, (desc, user_tok, ai_tok) in enumerate(turns, 1):
        input_tokens = cumulative + user_tok
        turn_cost = input_tokens * price_in + ai_tok * price_out
        running_cost += turn_cost
        total_input += input_tokens
        total_output += ai_tok

        cumulative += user_tok + ai_tok

        print(f"{i:>4} {desc:<30} {input_tokens:>8,} {ai_tok:>8} "
              f"${turn_cost:>7.4f} ${running_cost:>9.4f}")

    print("=" * 75)
    print(f"{'':>4} {'TOTAL':<30} {total_input:>8,} {total_output:>8} "
          f"{'':>8} ${running_cost:>9.4f}")

detailed_conversation_trace()

The total cost of this realistic 20-turn conversation is typically $2-5 — much more than most users expect from “just chatting.”

Key Takeaways

  1. Multi-turn costs grow quadratically. Turn nn processes all previous context, so total cost scales as O(n2)O(n^2).

  2. The cost you feel ≠ the cost you pay. A 20-turn conversation feels like 20 simple queries but costs 2-3× more due to accumulation.

  3. Prompt caching is essential. For production systems with repeated prefixes, caching saves 50-90% on input costs.

  4. Strategic truncation saves money. Keeping only the last 5-10 turns instead of the full history caps costs at a constant.

  5. Know your provider’s pricing. The difference between Sonnet and Haiku can be 10×, and between Sonnet and GPT-4o-mini can be 20×.


ByteBell helps engineering teams solve exactly this problem. Instead of stuffing everything into the context window, ByteBell’s Smart Context Refresh retrieves only what matters — keeping your AI sharp, fast, and accurate. Learn more at bytebell.ai