Multi-Model Router · for AI engineers & FinOps

Route simple queries to Haiku. Keep Opus for the hard ones.

Most production workloads don't need a frontier model for every query. A 70/25/5 split between cheap/mid/premium typically saves 40-70% with no quality loss - provided you classify correctly. See exactly what you'd save.

Pricing verified: 2026-06-05 106 models Classifier overhead included

What this calculator does

Route easy queries to Haiku 4.5 / GPT-5-mini / Gemini 3.1 Flash; reserve Opus 4.8 / GPT-5.5 / Gemini 3.1 Pro for the hard ones. See exactly how much you save.

Why use it

Typical production workloads see 40-70% cost reduction — no quality loss if routed correctly
Pick from 6 preset mixes (RAG, support, coding, research, content, balanced) or build your own
Includes classifier overhead — LLM-as-judge / embedding / none — because the classifier itself costs something
Compare your mix against every preset + sanity-check extremes (all-cheap, all-premium)

📖 Read the full guide →

These are the inputs, outputs, and how you can use this calculator for your AI workloads.

📥 Inputs you provide

Monthly requestsTotal monthly call volume
Input tokens / requestAverage input size
Output tokens / requestAverage output size
Tier splitCheap / mid / premium query mix
Baseline modelThe single model you compare against
Routing methodHow queries get assigned to a tier

📤 Outputs you get

Baseline costCost with one model for everything
Routed costCost of your tier mix
Monthly savingsDollars saved per month
Classifier overheadCost of the routing decision

🎯 Use your results to

🔀

Tune your tier split

Test cheap/mid/premium ratios; many workloads hit 40-60% savings around an 80/15/5 mix

🆚

Compare against baseline

Exact dollar gap vs running one model for everything, plus every preset and the extremes

⚙️

Price the classifier

Routing overhead netted out so the savings are honest, not gross

🔌

Integrate with your agents

MCP available so agentic workflows can pull routing economics programmatically

👇 Now try the calculator below with your own AI workloads

Enter every field

Monthly requests drives absolute scale. Input/output tokens per request affect each tier equally — tune to your real average. Then the 3-tier split: cheap % (simple queries — FAQ, retrieval Q&A, autocomplete), mid % (standard reasoning — explanations, synthesis), premium % (hard reasoning — architecture, edge cases). Sum should equal 100; sliders auto-adjust. Baseline selector = the single model you'd use if you weren't routing (defaults to your premium tier).

Pick a classifier strategy

No classifier = rules-based (regex, length, intent-in-URL) at zero overhead. LLM-as-judge = small model classifies each query at ~$0.0001/query (essentially free at scale). Embedding = embed query, match to intent clusters — cheaper still but ~85-92% accuracy vs ~90-95% for LLM-as-judge. Mis-routed queries trigger retries and can erode savings; factor in a 5-10% "escalation overhead" if accuracy is tight.

Interpret all panels

Top 3 cards compare baseline vs routed. Per-tier blocks show monthly cost for each tier + per-request cost — use this to spot tiers where you're paying more than necessary. Cost-share bar visualizes routed dollar distribution including classifier. Comparison table stacks your mix against every preset AND sanity-check extremes ("all cheap", "all premium") — useful for bounding max savings.

Next actions

Savings should be validated before shipping. Run offline eval: does your cheap-tier model handle your cheap-tier queries at acceptable quality? If yes, deploy with monitoring. Monitor the cheap-tier escalation rate (how often cheap-tier output is rejected and retried on premium) — if >10%, your classifier needs tuning. Stack with prompt cache and batch API — savings multiply, not add.

📊 Calculator at a glance

📅 Schedule a meeting via AvatarVA ✉️ Email [email protected]

🎛 CALCULATOR

🧭 Your workload shape

Start with a preset, then tune the mix.

Monthly requests

Input tokens / request

Output tokens / request

⚠️ Your tier percentages don't sum to 100% - results will be normalized.

🟢 Cheap tier - simple queries

70%

🟡 Mid tier - standard queries

25%

🟣 Premium tier - complex queries

Compare against (baseline) What would you be using if you weren't routing? Baseline defaults to your premium-tier model.

Routing method -

📈 RESULTS

Baseline (single model)

With routing

Monthly savings

Cheap tier cost Mid tier cost Premium tier cost Classifier overhead

💡 Recommendations

📋 Compare routing strategies side-by-side

All strategies at your workload. Green = biggest savings, gold = your current mix.

Strategy	Mix	Monthly	vs baseline	Savings

Stack with prompt cache savings → RAG Pipeline Cost → Single-model baseline → Get a routing architecture review →

📋 What now?

Validate the cheap tier before you ship — run an offline eval on your simple-query slice; the savings only hold if the cheap model answers those at acceptable quality.
Add an escalation path — route low-confidence answers up to premium so misroutes don't turn into retries and complaints. Watch the cheap-tier escalation rate; above ~10% means the classifier needs tuning.
Then stack the other levers — prompt caching and batch processing multiply with routing rather than just adding, often another 20-40% off the routed number.

📅 Book a routing-architecture review to apply this to your workload →

Route simple queries to Haiku. Keep Opus for the hard ones.

Go deeper

The calculator's an estimate. Want the real number?