Cheapest Model Finder

3 questions → the cheapest model for your job

Skip spreadsheet-comparing 25 models. Tell us what you're building - we'll return the cheapest 3 that fit.

Pricing verified: 2026-06-05 Takes 30 seconds
What this calculator does

Use-case-driven model picker: tell it what you're building, get a ranked list of cheapest models meeting your quality floor — with cache + batch savings factored in.

Why use it
  • Lowest sticker price isn't always cheapest in production — cache hit rate, batch eligibility, modality matter
  • Use-case presets encode the actual workload (token shape, capability needs) so cheap-looking models that fail your task get auto-excluded
  • Per-model "why this works" reasons make the recommendation defensible to engineering and finance
  • Tier-1 / vision / agent constraints lock out models that fail compliance or capability requirements
Who uses this:
Vibe Coder High Quickest answer to "which is cheapest for my use case" — no manual price comparison Small Business High Defensible cheap-model choice with reasons that hold up to "why this and not that" Enterprise Medium Useful first pass; final selection often needs vendor-specific eval and compliance review

These are the inputs, outputs, and how you can use this calculator for your AI workloads.

📥 Inputs you provide
  • What are you buildingSets capability bar + token shape
  • Tier-1 provider onlyAllowlist for regulated workloads
  • Vision requiredImage-capable models only
  • Agent-capable requiredTool-use + reasoning required
📤 Outputs you get
  • Cheapest-first model rankingTop = cheapest meeting your bar
  • Per-model rationalesReasons specific to your use case
  • Per-call cost at typical shapeDollar cost at use-case token shape
🎯 Use your results to
💰
Cheapest valid model in 30 seconds

Stop manual vendor-page comparison

🛡️
Quality floor stays defended

Use-case selection enforces minimum capabilities

🧾
Defensible rationale per pick

Each recommendation comes with workload-specific reasons

Catch caching + batch savings

Effective prices factor in cache and batch discounts where supported

👇 Now try the calculator below with your own AI workloads

📊 Calculator at a glance
Cheapest Model full size
🎛 CALCULATOR
1 What are you building?Determines minimum-quality bar AND the typical token shape (input/output sizes) the recommendation is costed against.How to choose: Pick the closest match to YOUR primary use case. If you have multiple use cases, run the calc once per use case — they often select different optimal models.Read the full guide →

This sets the minimum capability bar.

💬
Chatbot / Q&A
Customer support, FAQ, simple answers
🏷️
Classification / extraction
Tagging, sentiment, structured output
📚
RAG / search
Long-context docs, citations
🤖
Agent / tool-use
Multi-step, function calling, reasoning
💻
Code generation
Completion, review, refactor
✍️
Content / writing
Long-form, creative, marketing
2 How much volume?

Daily requests at steady state.

🧪
Hobby / testing
< 100 req/day
🌱
Small startup
100 - 1K req/day
📈
Growing product
1K - 10K req/day
🏢
Production scale
10K - 100K req/day
🚀
High volume
100K+ req/day
3 Any constraints?When checked, limits to OpenAI / Anthropic / Google. Excludes DeepSeek, xAI, Mistral, and others that may not have your required compliance posture.How to choose: Check this only if you have an actual constraint (data residency rule, BAA requirement, procurement allowlist). Otherwise leave unchecked — Tier-1 models are typically 2-10× more expensive than the absolute cheapest.Read the full guide →

Compliance, region, provider trust. Pick all that apply.

No constraints
Any provider OK, cheapest wins
🏛
Tier-1 provider only
Anthropic / OpenAI / Google only
🇪🇺
EU data sovereignty
Mistral or EU-hosted endpoints
👁
Must support vision
Image understanding required
📜
Long context (500K+ tokens)
Large docs, codebases
🎬
Multimodal (audio/video)
Beyond text + vision
📈 RESULTS
📋 Example Workload - change any field to see your actual numbers
🏆 Your top 3 cheapest matches

-

See all models → Get expert advice →
📋 What now?
Want a second opinion on the switch? 💼 Talk to a CloudIntelligence advisor →
Now that you have your ranked list…

What this means + what to do next

💡 What to consider beyond this ranked list for full TCO
  • Quality variance at YOUR task — sticker price tells you nothing about how the model performs on your prompts
  • Migration cost — switching to a new vendor often means weeks of prompt re-engineering and re-eval
  • Volume commitments — some vendors' enterprise pricing beats list price 30-50% at scale
  • Latency requirements — cheapest model may not meet your p95 latency SLA
Rule of thumb: Cheapest meeting your QUALITY bar (not just capability bar). Always run an eval on 50-100 representative tasks before committing.
Quantify the hidden costs:
  • Get exact $/month at your real token shape (not the use-case default) Cost Calculator
  • If your workload has repeat context, caching savings often dwarf model-swap savings Prompt Cache Roi
  • Cheapest-from-new-vendor often comes with lock-in cost — quantify it Vendor Concentration Risk
$ How this fits your overall ROI

Picking the cheapest valid model is half the work. The other half:

  • Does the cheapest model meet our quality bar on 100 real production examples?
  • What's the eval and migration cost if we switch from current vendor?
  • Will volume discounts from current vendor beat list-price savings from new one?
Bridge to ROI:
  • Rather than one cheap model, route easy queries cheap and hard queries premium Multi Model Router
  • Cache savings often beat model-swap savings on repeat-context workloads Prompt Cache Roi
  • Async-eligible portion of traffic gets 50% off — often bigger than the cheap-model gap Batch Vs Realtime
Doing something different?

If picking the cheapest isn't your real question:

  • You want to browse + filter, not get a single recommendation Ai Model Finder
  • Your traffic is mixed — easy and hard queries should use different models Multi Model Router
  • You've already picked a model and just want the dollar number Cost Calculator

Go deeper

Our playbooks on cutting this number.

🧮
AI Cost Calculator
Exact monthly cost for picked model
🔍
AI Model Finder
Full comparison matrix
📈
Scale Projection
What happens at 10x usage?
💾
Prompt Caching
Save 50-90% on input tokens

The calculator's an estimate. Want the real number?

A 5-day Quickscan ($1,500) reviews your actual usage across every pillar — financial, reliability, governance, privacy, MLOps, observability — and returns a concrete savings plan.

Book a Quickscan →
📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-05.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-05
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 578 days captured
Anthropic Docs
2026-06-05
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 578 days captured
OpenAI
2026-06-05
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 579 days captured
Google AI
2026-06-05
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 554 days captured
Google Vertex
2026-06-05
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 554 days captured
DeepSeek
2026-06-05
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 493 days captured
xAI
2026-06-05
https://x.ai/api
Daily snapshot since Nov 2024 · 411 days captured
Mistral
2026-06-05
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 552 days captured
Cohere
2026-06-05
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 578 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →