Cheapest Model Finder

3 questions → the cheapest model for your job

Skip spreadsheet-comparing 25 models. Tell us what you're building - we'll return the cheapest 3 that fit.

Pricing verified: 2026-06-05 Takes 30 seconds

What this calculator does

Use-case-driven model picker: tell it what you're building, get a ranked list of cheapest models meeting your quality floor — with cache + batch savings factored in.

Why use it

Lowest sticker price isn't always cheapest in production — cache hit rate, batch eligibility, modality matter
Use-case presets encode the actual workload (token shape, capability needs) so cheap-looking models that fail your task get auto-excluded
Per-model "why this works" reasons make the recommendation defensible to engineering and finance
Tier-1 / vision / agent constraints lock out models that fail compliance or capability requirements

📖 Read the full guide →

Who uses this:

Vibe Coder High Small Business High Enterprise Medium

These are the inputs, outputs, and how you can use this calculator for your AI workloads.

📥 Inputs you provide

What are you buildingSets capability bar + token shape
Tier-1 provider onlyAllowlist for regulated workloads
Vision requiredImage-capable models only
Agent-capable requiredTool-use + reasoning required

📤 Outputs you get

Cheapest-first model rankingTop = cheapest meeting your bar
Per-model rationalesReasons specific to your use case
Per-call cost at typical shapeDollar cost at use-case token shape

🎯 Use your results to

💰

Cheapest valid model in 30 seconds

Stop manual vendor-page comparison

🛡️

Quality floor stays defended

Use-case selection enforces minimum capabilities

🧾

Defensible rationale per pick

Each recommendation comes with workload-specific reasons

⚡

Catch caching + batch savings

Effective prices factor in cache and batch discounts where supported

👇 Now try the calculator below with your own AI workloads

Pick the right use case

Chatbot = short turns, 500/300 tokens, no special caps. Classification = 300/50 tokens, super cheap models OK. RAG = 3.5K/800 tokens, needs long-context capability. Agent = 2K/1.5K tokens, needs tool-use + reasoning. Coding = 2K/1.5K, needs coding-tuned model. Content = 1.5K/2K, longer outputs. Picking accurately matters — wrong use case = wrong cheapest answer.

Add constraints that are NON-negotiable

Tier-1 only = OpenAI / Anthropic / Google (excludes DeepSeek, xAI, others for regulated workloads). Vision = must handle images. Agent-capable = excludes Nano-tier and Flash-Lite which lack tool-use reasoning. Add constraints only when actually required — over-constraining hides cheap valid options.

Read "why this works" reasons

Top result is cheapest. Each model has reasons explaining the pick: "Cheapest Tier-1 model in 2026", "Free tier with reduced quota available", "Prompt caching 90% off — huge for repeat-context RAG", "Batch API (50% off) available — your workload is often batch-eligible". These are NOT marketing — they're workload-specific rationales.

Validate then ship

Before committing: (1) run the cheap-tier candidate on an eval set — does it actually meet your quality bar? (2) verify cached pricing / batch eligibility applies to your traffic pattern. (3) check vendor compliance — data residency, BAA, retention. If the cheapest fails eval, work up the list. Most teams find the 2nd-cheapest is the real winner.

📊 Calculator at a glance

📅 Schedule a meeting via AvatarVA ✉️ Email [email protected]

🎛 CALCULATOR

1 What are you building?

This sets the minimum capability bar.

💬

Chatbot / Q&A

Customer support, FAQ, simple answers

🏷️

Classification / extraction

Tagging, sentiment, structured output

📚

RAG / search

Long-context docs, citations

🤖

Agent / tool-use

Multi-step, function calling, reasoning

💻

Code generation

Completion, review, refactor

✍️

Content / writing

Long-form, creative, marketing

2 How much volume?

Daily requests at steady state.

🧪

Hobby / testing

< 100 req/day

🌱

Small startup

100 - 1K req/day

📈

Growing product

1K - 10K req/day

🏢

Production scale

10K - 100K req/day

🚀

High volume

100K+ req/day

3 Any constraints?

Compliance, region, provider trust. Pick all that apply.

✅

No constraints

Any provider OK, cheapest wins

🏛

Tier-1 provider only

Anthropic / OpenAI / Google only

🇪🇺

EU data sovereignty

Mistral or EU-hosted endpoints

👁

Must support vision

Image understanding required

📜

Long context (500K+ tokens)

Large docs, codebases

🎬

Multimodal (audio/video)

Beyond text + vision

📈 RESULTS

🏆 Your top 3 cheapest matches

See all models → Get expert advice →

📋 What now?

Switch to the cheapest eligible tier — same capability bar, lower bill
Open the winner in the calculator — confirm the exact monthly cost for your volume
Keep a quality floor — your constraints already excluded anything too weak

Want a second opinion on the switch? 💼 Talk to a CloudIntelligence advisor →

Now that you have your ranked list…

What this means + what to do next

💡 What to consider beyond this ranked list for full TCO

Quality variance at YOUR task — sticker price tells you nothing about how the model performs on your prompts
Migration cost — switching to a new vendor often means weeks of prompt re-engineering and re-eval
Volume commitments — some vendors' enterprise pricing beats list price 30-50% at scale
Latency requirements — cheapest model may not meet your p95 latency SLA

Rule of thumb: Cheapest meeting your QUALITY bar (not just capability bar). Always run an eval on 50-100 representative tasks before committing.

Quantify the hidden costs:

Get exact $/month at your real token shape (not the use-case default) Cost Calculator
If your workload has repeat context, caching savings often dwarf model-swap savings Prompt Cache Roi
Cheapest-from-new-vendor often comes with lock-in cost — quantify it Vendor Concentration Risk

$ How this fits your overall ROI

Picking the cheapest valid model is half the work. The other half:

Does the cheapest model meet our quality bar on 100 real production examples?
What's the eval and migration cost if we switch from current vendor?
Will volume discounts from current vendor beat list-price savings from new one?

Bridge to ROI:

Rather than one cheap model, route easy queries cheap and hard queries premium Multi Model Router
Cache savings often beat model-swap savings on repeat-context workloads Prompt Cache Roi
Async-eligible portion of traffic gets 50% off — often bigger than the cheap-model gap Batch Vs Realtime

Doing something different?

If picking the cheapest isn't your real question:

You want to browse + filter, not get a single recommendation Ai Model Finder
Your traffic is mixed — easy and hard queries should use different models Multi Model Router
You've already picked a model and just want the dollar number Cost Calculator

Vendor / Model

Field

Why it’s inferred

Anthropic — Claude Sonnet 4.6

cachedInput

Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.

Anthropic — Claude Sonnet 4.5

cachedInput

Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.

Anthropic — Claude Sonnet 4.5

batchInput

Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Sonnet 4.5

batchOutput

Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Haiku 4.5

cachedInput

Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.

OpenAI — GPT-5.4 Mini

cachedInput

Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.

OpenAI — GPT-5.4 Nano

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Nano

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Nano

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Pro

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.2

cachedInput

Derived at 10% of input; no residency uplift.

OpenAI — GPT-5.2

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2

batchOutput

Derived at 50% of output.

OpenAI — GPT-5

cachedInput

Derived at 10% of input.

OpenAI — GPT-5

batchInput

Derived at 50% of input.

OpenAI — GPT-5

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.5 Pro

cachedInput

Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.

OpenAI — GPT-5.5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.2 Pro

cachedInput

Derived at 10% of input — pro-tier convention.

OpenAI — GPT-5.2 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.1

batchInput

Derived at 50% of input.

OpenAI — GPT-5.1

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Nano

cachedInput

Derived at 10% of input.

OpenAI — GPT-5 Nano

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Nano

batchOutput

Derived at 50% of output.

Google — Gemini 3 Flash

cachedInput

Derived at 10% of input — Google caching discount convention ~90%.

Google — Gemini 3.1 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 3.1 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 3.1 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Pro

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.5 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

cachedInput

Derived at 25% of input per Google 2.0 family caching rates.

Google — Gemini 2.0 Flash

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.0 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

xAI — Grok 4 (legacy)

cachedInput

Extrapolated at 25% of base.

3 questions → the cheapest model for your job

What this means + what to do next

Go deeper

The calculator's an estimate. Want the real number?

Methodology

Primary sources

Inferred values (marked with * in calculator tables)