Guides → Playground & Guide → Cheapest Model - Best Value for Your Workload

Cheapest Model - Best Value for Your Workload

Meet Marcus Lee. Senior Engineer told to 'use the cheap model' for a new feature. "Cheapest model is meaningless without context. Cheapest for WHAT?"

🔥 Switched to Haiku to save money. Quality dropped. Switched back. CFO unhappy.

The story

'Cheapest model' is the wrong question - 'cheapest model that hits my quality bar' is the right one. Gemini 3 Flash at $0.50/1M output is cheap. So is DeepSeek V3 at $0.27. Both 'fail' on certain tasks where Claude Haiku or GPT-5 Mini succeed. Cheapest only matters if quality clears the threshold.

Marcus's mistake: switched to the absolute cheapest tier without testing on his workload. Customer support classification - Haiku worked, saved 60%. But for the agentic workflow with tool calls, Haiku struggled with the schema and returned malformed JSON. Quality cost outweighed price savings.

Three buckets of 'cheap.' (1) Ultra-cheap (DeepSeek, Gemini Flash, Haiku) - fine for classification, simple Q&A, narrow extraction. (2) Mid-cheap (GPT-5 Mini, Sonnet 3.5) - solid for general agentic work, RAG, structured outputs. (3) Almost-frontier (Sonnet 4.6, GPT-5.5) - needed for complex reasoning, math, code generation. Pick the cheapest tier that passes your eval, not the cheapest model overall.

📊 CALCULATOR AT A GLANCE

🚀 Open the full calculator ✉️ Email [email protected]

🎛 Inputs you control

Each input shapes the cost. Click an input on the calculator to set it — explanations below match the live calculator field by field.

▸ What are you building — Determines minimum-quality bar AND the typical token shape (input/output sizes) the recommendation is costed against.

How to choose: Pick the closest match to YOUR primary use case. If you have multiple use cases, run the calc once per use case — they often select different optimal models.

▸ Tier-1 provider only — When checked, limits to OpenAI / Anthropic / Google. Excludes DeepSeek, xAI, Mistral, and others that may not have your required compliance posture.

How to choose: Check this only if you have an actual constraint (data residency rule, BAA requirement, procurement allowlist). Otherwise leave unchecked — Tier-1 models are typically 2-10× more expensive than the absolute cheapest.

▸ Vision required — Filters to models that can process images.

How to choose: Only check if vision is required for your workload. Vision-capable variants of mid-tier models often cost more than the text-only variant.

▸ Agent-capable required — Excludes Nano-tier and Flash-Lite models that lack the reasoning depth for multi-turn tool use.

How to choose: Check this for workloads with iterative tool calls, planning, or multi-step reasoning. Skip for one-shot tasks (classification, simple extraction, single-turn chat).

📊 Outputs computed for you

What you'll see after the calculator runs. Each card explains how to read the number.

▸ Cheapest-first model ranking — Models meeting ALL your constraints, ordered cheapest to most expensive at the use case's typical token shape.

How to read: Top entry is the cheapest valid option. Don't default to it blindly — read the "why this works" reasons. If the cheapest is a brand-new or unfamiliar provider, the 2nd-cheapest from a familiar vendor is often the pragmatic choice.

▸ Per-model rationales — Workload-specific reasons each model survives your filters: caching applicability, batch eligibility, long-context handling, free-tier availability, compliance posture.

How to read: A "Prompt caching 90% off" note means effective price is dramatically lower than sticker for repeat-context workloads. A "Batch API (50% off)" note means async-eligible traffic gets half-price.

▸ Per-call cost at typical shape — Dollar cost per single API call using the use case's default input/output token shape. With cache/batch savings applied where relevant.

How to read: Multiply by your expected daily request count for daily total. If your actual token shape differs from the preset, recompute in Cost Calculator with your real numbers.

About this calculator: Cheapest Model - Best Value for Your Workload

Cheapest LLM for your workload - by tier, by task type, by quality threshold. Updated daily as vendors shift pricing. Beyond the per-token table.

Inputs you control

Input	Impact on result	Range	Typical
Required quality (1-10)	How critical is quality? 1 = noise OK. 5 = decent. 7 = production. 9 = mission-critical. 10 = no failure tolerance.	1 – 10	6
Tasks per day	Volume - at 50K+/day, the per-task gap × volume becomes meaningful.	10 – 10M	50000
Task complexity (1-10)	1 = classification. 3 = simple Q&A. 5 = RAG over docs. 7 = multi-step reasoning. 10 = research-grade analysis.	1 – 10	5

Outputs computed for you · model: `token`

Output	How inputs affect it
Monthly cost ($)	computed from inputs
Annual cost ($)	monthlyUsd × 12

Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.

What you're looking at

Each input shapes your cost. Move the slider — see the impact.

Required quality (1-10) 6

How critical is quality? 1 = noise OK. 5 = decent. 7 = production. 9 = mission-critical. 10 = no failure tolerance.

Estimated: —

Tasks per day 50,000

Volume - at 50K+/day, the per-task gap × volume becomes meaningful.

Estimated: —

Task complexity (1-10) 5

1 = classification. 3 = simple Q&A. 5 = RAG over docs. 7 = multi-step reasoning. 10 = research-grade analysis.

Estimated: —

Ready to run the numbers?

Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.

🚀 Open the full calculator →

Reading your result

Match complexity score to model tier. Score 1-3: ultra-cheap fine. 4-6: mid-cheap. 7-9: balanced/premium. 10: frontier only.

Quality threshold acts as a floor. Required 7+ quality eliminates ultra-cheap tier regardless of complexity. Required 9+ may eliminate mid-cheap too.

Volume amplifies savings - and risks. At 50K/day, picking 30% cheaper-per-task = $X/mo saved. Picking 5% lower-quality = customer complaints. Test before scaling.

What "good" looks like:

Ultra-cheap winners now: Gemini 3 Flash, DeepSeek V3, Haiku 4.5 - for tasks scoring complexity ≤4 and quality ≤6
Mid-cheap winners: GPT-5 Mini, Claude Haiku - for complexity 4-6, quality 6-8
Best value at premium quality: Sonnet 4.6 (often beats Opus on cost-quality ratio)
Avoid: Frontier-only for tasks scoring complexity ≤6 - wastes 5-10× the cost

Cheapest LLMs right now (per 1M tokens)

Verified 20 hours ago

1

Command R7b 12.2024

$0.150 in · $0.037 out ·
2

voxtral-mini

$0.040 in · $0.040 out ·
3

ministral-3-3b

$0.100 in · $0.100 out ·

Three real scenarios

Same calculator, three different team sizes. Click a tab to see how the numbers shift.

$5,302 / month ≈ $63,618 / year

Sentiment classification on user reviews. Low complexity, low quality bar. Ultra-cheap tier wins by 80% over mid-cheap. ~$300/mo at 100K/day.

Healthy range: DeepSeek V3 / Gemini Flash, ~$300/mo

See inputs used

qualityThreshold: 5
tasksPerDay: 100,000
complexityScore: 2
inputTokensPerTask: 800
outputTokensPerTask: 50

Trade-offs

Cost isn't the only dimension. Click any constraint — see how recommendations change.

What matters most to you? Click any dimension — recommendations update.

Best fit for "cost":

Ultra-cheap (DeepSeek, Gemini Flash) $0.10-0.50/1M output
Mid-cheap (Haiku 4.5, GPT-5 Mini) $1-3/1M output
Balanced (Sonnet 4.6) $15/1M - best value at quality

Cost ranks change weekly. Anchor on tier, not vendor. Re-check pricing quarterly because rankings shift as vendors compete.

Use cases

Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.

$35,692 / month ≈ $428,305 / year

High-volume tool selection (which API to call?). Mid-cheap with prompt caching cuts cost dramatically. Tool defs cache → 80% input discount.

Healthy range: Haiku + caching, ~$2.5K/mo

See inputs used

qualityThreshold: 7
tasksPerDay: 200,000
complexityScore: 5
inputTokensPerTask: 3,000
outputTokensPerTask: 100

What this calculator can't tell you

Honest limitations — every model is wrong; some are useful. Where this one falls short:

Quality scores are heuristic - measure on your actual workload.
Vendor pricing changes faster than this calc updates - verify in vendor pricing pages.
Doesn't model rate-limit constraints (some cheap tiers have low TPM/RPM ceilings).
Doesn't include latency tradeoffs for real-time workloads.

For these, use: Multi-Model Router for routing strategy. Cost Calculator for full bill.

Where to go next

Route per-query for stacked savings →

Use cheap for easy, premium for hard.

Project full bill with chosen model →

Validate the savings.

FT a cheap model for narrow tasks →

Cheap + FT often beats premium + prompting.

Methodology

Source: /ai-cost-economics
Extraction: Per-vendor pricing pulled daily. Quality benchmarks from LMSYS Arena, MMLU, HumanEval.
Editorial gate: 8-layer defense — see aicost.ai/ai-cost-economics
Last verified: 6/4/2026, 8:00:00 PM

Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.

3 years of pricing history

Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.

View 3-year history for →

📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

All prices are USD per 1 million tokens, current as of 2026-06-05.
Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
Batch API discounts are 50% off standard rates across providers that offer Batch mode.
Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
Long-context pricing tiers apply when input exceeds model threshold.
Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic

2026-06-05

https://www.anthropic.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Anthropic Docs

2026-06-05

https://platform.claude.com/docs/en/about-claude/pricing

Daily snapshot since Sep 2023 · 578 days captured

OpenAI

2026-06-05

https://openai.com/api/pricing/

Daily snapshot since Sep 2023 · 579 days captured

Google AI

2026-06-05

https://ai.google.dev/gemini-api/docs/pricing

Daily snapshot since Dec 2023 · 554 days captured

Google Vertex

2026-06-05

https://cloud.google.com/vertex-ai/generative-ai/pricing

Daily snapshot since Dec 2023 · 554 days captured

DeepSeek

2026-06-05

https://api-docs.deepseek.com/quick_start/pricing

Daily snapshot since May 2024 · 493 days captured

xAI

2026-06-05

https://x.ai/api

Daily snapshot since Nov 2024 · 411 days captured

Mistral

2026-06-05

https://mistral.ai/pricing

Daily snapshot since Dec 2023 · 552 days captured

Cohere

2026-06-05

https://cohere.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Voyage AI

2026-06-05

https://docs.voyageai.com/docs/pricing

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →

Cheapest Model - Best Value for Your Workload

The story

🎛 Inputs you control

📊 Outputs computed for you

About this calculator: Cheapest Model - Best Value for Your Workload

Inputs you control

Outputs computed for you · model: `token`

What you're looking at

Ready to run the numbers?

Reading your result

Cheapest LLMs right now (per 1M tokens)

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

The story

🎛 Inputs you control

📊 Outputs computed for you

About this calculator: Cheapest Model - Best Value for Your Workload

Inputs you control

Outputs computed for you · model: token

What you're looking at

Ready to run the numbers?

Reading your result

Cheapest LLMs right now (per 1M tokens)

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

Outputs computed for you · model: `token`