Skip to the content.

Managing Context: Reducing Hallucinations and Run Costs

Long sessions fill the context window, burying your instructions under noise. The model forgets your conventions, hallucinates file contents, and every reply costs more.


What is Context?

Every message re-sends the entire conversation to the API — your rules, all previous messages, every tool output. This bundle is the context window, measured in tokens (~0.75 words each). All Claude models support 200,000 tokens (~1,500 pages).

What fills it:

Context window used space vs free space


Context Rot and Cost

Context rot happens past ~70%: early instructions get buried, the model attends to recent noise over old rules, hallucinations increase.

Cost scales linearly — every message re-sends the full window. 50 messages at 150k tokens on Sonnet = $22.50 in input costs alone.

Cost per message vs context size

Model $/MTok input Context window @ 50k @ 150k @ max
GPT-4o mini $0.15 128k $0.01 $0.02 $0.02
Kimi K2.5 ★ $0.60 256k $0.03 $0.09 $0.15
Claude Haiku 4.5 $1.00 200k $0.05 $0.15 $0.20
Gemini 1.5 Pro $1.25 2M $0.06 $0.19 $2.50
GPT-4o $2.50 128k $0.13 $0.38 $0.32
Claude Sonnet 4.6 $3.00 200k $0.15 $0.45 $0.60
Claude Opus 4.6 $5.00 200k $0.25 $0.75 $1.00

★ Kimi K2.5 is the model used in this course (via OpenRouter). Even so, 50 messages × 150k tokens = $4.50. Context management is a billing issue, not just a quality issue.

For up-to-date pricing and context window specs across all models, see models.dev.

Check your context at any time with /context. Here’s real output from the session used to write this tutorial (reading files, running web searches, generating charts):

❯ /context

  Context Usage
  claude-sonnet-4-6 · 56k/200k tokens (28%)

  Estimated usage by category
  System prompt:   3.5k tokens  (1.7%)
  System tools:     21k tokens (10.5%)
  MCP tools:       4.7k tokens  (2.3%)
  Skills:           <1k tokens  (0.0%)
  Messages:        28.2k tokens (14.1%)
  Compact buffer:    3k tokens  (1.5%)
  ─────────────────────────────────────
  Free space:       140k tokens (69.8%)

A few things stand out here: system tools (10.5%) — Claude Code’s built-in tool definitions are re-sent every message even when unused. MCP tools add another 2.3% just for being registered, regardless of use. Both are fixed overhead you can’t compress away — only /compact or a fresh session resets them.


The Solutions: /clear and /compact

/clear resets the conversation entirely — zero history, zero tool outputs, back to just the system prompt. Use it between unrelated tasks or when the session has gone off track. It’s instant and free.

/compact summarises the entire conversation into a structured digest (~5k tokens) and discards the raw history — typically a 97% reduction. Unlike /clear it preserves continuity: decisions, file states, and completed work survive in the summary. Use it when you want to keep going on the same task with a lighter context.

  /clear /compact
History after Gone Summarised (~5k tokens)
When to use Between tasks, fresh start Mid-task, keep continuity
Cost Free One summary call

Auto-compact warning: The automatic trigger at 80% only saves titles and brief excerpts — not full content. Don’t rely on it. Use /compact manually.


Further Reading