Guides → Playground & Guide → Cheapest Model - Best Value for Your Workload
Meet Marcus Lee. Senior Engineer told to 'use the cheap model' for a new feature. "Cheapest model is meaningless without context. Cheapest for WHAT?"
🔥 Switched to Haiku to save money. Quality dropped. Switched back. CFO unhappy.
'Cheapest model' is the wrong question - 'cheapest model that hits my quality bar' is the right one. Gemini 3 Flash at $0.50/1M output is cheap. So is DeepSeek V3 at $0.27. Both 'fail' on certain tasks where Claude Haiku or GPT-5 Mini succeed. Cheapest only matters if quality clears the threshold.
Marcus's mistake: switched to the absolute cheapest tier without testing on his workload. Customer support classification - Haiku worked, saved 60%. But for the agentic workflow with tool calls, Haiku struggled with the schema and returned malformed JSON. Quality cost outweighed price savings.
Three buckets of 'cheap.' (1) Ultra-cheap (DeepSeek, Gemini Flash, Haiku) - fine for classification, simple Q&A, narrow extraction. (2) Mid-cheap (GPT-5 Mini, Sonnet 3.5) - solid for general agentic work, RAG, structured outputs. (3) Almost-frontier (Sonnet 4.6, GPT-5.5) - needed for complex reasoning, math, code generation. Pick the cheapest tier that passes your eval, not the cheapest model overall.
Each input shapes the cost. Click an input on the calculator to set it — explanations below match the live calculator field by field.
What you'll see after the calculator runs. Each card explains how to read the number.
Cheapest LLM for your workload - by tier, by task type, by quality threshold. Updated daily as vendors shift pricing. Beyond the per-token table.
token
Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →Match complexity score to model tier. Score 1-3: ultra-cheap fine. 4-6: mid-cheap. 7-9: balanced/premium. 10: frontier only.
Quality threshold acts as a floor. Required 7+ quality eliminates ultra-cheap tier regardless of complexity. Required 9+ may eliminate mid-cheap too.
Volume amplifies savings - and risks. At 50K/day, picking 30% cheaper-per-task = $X/mo saved. Picking 5% lower-quality = customer complaints. Test before scaling.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
Sentiment classification on user reviews. Low complexity, low quality bar. Ultra-cheap tier wins by 80% over mid-cheap. ~$300/mo at 100K/day.
Healthy range: DeepSeek V3 / Gemini Flash, ~$300/mo
Marcus's mid-tier workload. Quality 7 = ultra-cheap not enough. Mid-cheap tier hits the sweet spot. Haiku 4.5 ~$1.5K/mo, GPT-5 Mini similar. 4× cheaper than Sonnet at acceptable quality.
Healthy range: Haiku 4.5 or GPT-5 Mini, ~$1.5K/mo
Complex multi-step reasoning, mission-critical accuracy. Frontier tier mandatory. Sonnet 4.6 typically wins on cost-quality ratio at this complexity level. ~$8K/mo.
Healthy range: Sonnet 4.6 or GPT-5.5, ~$8K/mo
Cost isn't the only dimension. Click any constraint — see how recommendations change.
Cost ranks change weekly. Anchor on tier, not vendor. Re-check pricing quarterly because rankings shift as vendors compete.
Hallucination rate scales inversely with model capability - usually. Test on your domain because some cheap models punch above weight on specific tasks.
Compliance certifications often skip cheap tiers. Check before assuming the cheapest model has the same BAA/SOC 2 as the flagship.
Some cheap-tier APIs default to data-used-for-training. Read terms before piping user content.
Bonus: cheap tier usually wins on latency too. Mid-cheap beats premium for sub-300ms requirements.
The cheapest model 6 months ago isn't the cheapest now. Build vendor abstraction; don't hardcode the bargain.
Switching to a cheaper model without an eval is gambling. Build the eval first, switch second.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
High-volume tool selection (which API to call?). Mid-cheap with prompt caching cuts cost dramatically. Tool defs cache → 80% input discount.
Healthy range: Haiku + caching, ~$2.5K/mo
Autocomplete-style code suggestions. High volume, modest complexity. Mid-cheap fine. Fine-tuned cheap (Mistral/Llama) often wins at this scale.
Healthy range: GPT-5 Mini or fine-tuned Haiku
Marketing copy, blog posts, ad variants. Output quality matters for brand voice. Balanced tier. Sonnet 4.6 at this complexity beats both ultra-cheap (quality) and Opus (cost).
Healthy range: Sonnet 4.6 sweet spot
Long-form research synthesis. Cheap tiers fail. Premium tier mandatory. Volume is small, so absolute cost is fine.
Healthy range: Opus 4.7 / GPT-5.5 Pro mandatory
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Multi-Model Router for routing strategy. Cost Calculator for full bill.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →