Does prompt caching actually save you money?
Cache economics vary sharply by provider. Anthropic charges a 25% write premium - so caching only saves if your hit rate clears break-even. OpenAI and Google charge no write premium - caching is always free upside.
Does prompt caching actually save you money on your workload — or does it cost you more?
- Anthropic charges a 25% write premium — caching can LOSE money below a 22% hit rate
- OpenAI and Google have no write premium — caching is pure upside from hit rate 0.01% upward
- See exact break-even hit rate for your chosen model before enabling cache in production
- Compare cache ROI across all 8 cache-capable models at one click
These are the inputs, outputs, and how you can use this calculator for your AI workloads.
- ModelPick a cache-capable model
- Requests per monthTotal monthly call volume
- Input tokens / requestAverage input size
- Output tokens / requestAverage output size
- Cacheable portion of inputStatic-prefix share of input
- Cache hit rateHow often the cache is warm
- No-cache costMonthly cost with caching off
- With-cache costMonthly cost with caching on
- Monthly savingsDollars saved (or lost) per month
- Break-even hit rateMinimum hit rate to save
See the exact hit rate caching must clear before you flip it on in production
Real monthly and annual dollars at your hit rate — not a vendor headline number
Anthropic vs OpenAI vs Google cache economics side-by-side on the same workload
MCP available so agentic workflows can pull cache ROI programmatically
👇 Now try the calculator below with your own AI workloads
Start with a preset, then adjust.
Same workload, different models. Green row = biggest savings, gold = your current model.
| Model | Base / cached input | No cache | With cache | Savings |
|---|
- If you're above break-even — ship caching on the static prefix first (system prompt + tool definitions + few-shot examples). That's ~80% of the savings for ~20% of the effort.
- If you're below break-even on Anthropic — raise the hit rate (consolidate calls, use the 1-hour extended cache) or move that workload to a no-write-premium provider where any hit rate saves.
- Verify before you bank it — measure the real hit rate in production for 1-2 weeks; the first request after each TTL expiry always pays full price, so steady-state numbers differ from the estimate.