Route simple queries to Haiku. Keep Opus for the hard ones.
Most production workloads don't need a frontier model for every query. A 70/25/5 split between cheap/mid/premium typically saves 40-70% with no quality loss - provided you classify correctly. See exactly what you'd save.
Route easy queries to Haiku 4.5 / GPT-5-mini / Gemini 3.1 Flash; reserve Opus 4.8 / GPT-5.5 / Gemini 3.1 Pro for the hard ones. See exactly how much you save.
- Typical production workloads see 40-70% cost reduction — no quality loss if routed correctly
- Pick from 6 preset mixes (RAG, support, coding, research, content, balanced) or build your own
- Includes classifier overhead — LLM-as-judge / embedding / none — because the classifier itself costs something
- Compare your mix against every preset + sanity-check extremes (all-cheap, all-premium)
These are the inputs, outputs, and how you can use this calculator for your AI workloads.
- Monthly requestsTotal monthly call volume
- Input tokens / requestAverage input size
- Output tokens / requestAverage output size
- Tier splitCheap / mid / premium query mix
- Baseline modelThe single model you compare against
- Routing methodHow queries get assigned to a tier
- Baseline costCost with one model for everything
- Routed costCost of your tier mix
- Monthly savingsDollars saved per month
- Classifier overheadCost of the routing decision
Test cheap/mid/premium ratios; many workloads hit 40-60% savings around an 80/15/5 mix
Exact dollar gap vs running one model for everything, plus every preset and the extremes
Routing overhead netted out so the savings are honest, not gross
MCP available so agentic workflows can pull routing economics programmatically
👇 Now try the calculator below with your own AI workloads
Start with a preset, then tune the mix.
All strategies at your workload. Green = biggest savings, gold = your current mix.
| Strategy | Mix | Monthly | vs baseline | Savings |
|---|
- Validate the cheap tier before you ship — run an offline eval on your simple-query slice; the savings only hold if the cheap model answers those at acceptable quality.
- Add an escalation path — route low-confidence answers up to premium so misroutes don't turn into retries and complaints. Watch the cheap-tier escalation rate; above ~10% means the classifier needs tuning.
- Then stack the other levers — prompt caching and batch processing multiply with routing rather than just adding, often another 20-40% off the routed number.