AI Model Finder

Find the cheapest AI model for your workload

Compare every major LLM side-by-side. Sorted by price. Filter by context, modality, or compliance.

Pricing verified: 2026-06-05 161 models, 8 providers 🔄 Refreshed weekly
What this calculator does

Filter the full catalog of 150+ AI models by your role, workload, capability needs, and price ceiling — output is a ranked shortlist you can ship to other calculators.

Why use it
  • Stop reading vendor blog posts — every model is in one table, freshness-stamped
  • Filter by what you actually need (vision, long-context, tier-1 only) instead of skimming marketing pages
  • Price slider + min-context slider narrow 150+ models to the 5-10 worth comparing
  • Compare checkbox lets you hold 2-4 candidates side-by-side before exporting to Cost Calculator
Who uses this:
Vibe Coder High Quick way to discover the new model that just launched without reading 10 launch posts Small Business High Filter to Tier-1 + budget cap quickly when you need to swap a model and don't have time to research Enterprise Medium Useful first-pass for procurement teams; final selection usually needs deeper eval

These are the inputs, outputs, and how you can use this calculator for your AI workloads.

📥 Inputs you provide
  • Your roleTunes filter defaults to your priorities
  • Your workload scenarioPre-tunes capability filters
  • Capability chipsAND-filters by feature
  • Max input price per 1M tokensCuts everything above your ceiling
  • Minimum context windowCuts small-window models
  • Provider filtersVendor allowlist
📤 Outputs you get
  • Ranked model listModels matching all filters
  • Side-by-side compareCheck 2-4 to compare in detail
  • Pricing freshnessDays since last verification
  • Capability tagsWhat this model can do
🎯 Use your results to
🎯
Shortlist in 60 seconds

Cut 150+ models to 5-10 candidates without reading any vendor blog post

🔍
Find what you didn't know existed

New providers launch monthly — this surfaces them instead of you finding out on Twitter

🛒
Make procurement defensible

Compare table is screenshot-ready for finance and security review

📦
Hand off to other calcs

Export shortlist directly to Cost Calculator or Multi-Model Router for the actual decision

👇 Now try the calculator below with your own AI workloads

📊 Calculator at a glance
Ai Model Finder full size
📋 Example Workload - adjust filters to find models for your specific workload
🎛 CALCULATOR
👤 Your rolePre-tunes filter weights based on what your role typically cares about (developer = context + reasoning; PM = balance; finance = price + provider).How to choose: Pick the role closest to the decision you're making. If your work spans multiple roles, choose the one most cost-sensitive — it forces the tightest filter.Read the full guide →
📊 Your workloadTells the finder what capabilities matter for your workload (RAG needs long-context; agents need reasoning + tool use; classification needs cheap + fast).How to choose: Pick the closest match. If you're building multiple things, run the finder once per workload — they often pick different optimal models.Read the full guide →
🔀 Sort by
Providers:Per-provider toggle to include/exclude a vendor entirely. Useful when contract or compliance constraints limit which vendors you can use.How to choose: Start with everyone checked. Uncheck only when you have a concrete reason (no GDPR-compliant region, no enterprise BAA, existing volume commitment elsewhere).Read the full guide →
Modality:Toggle chips to keep only models that support a feature. Vision filters multimodal vision-capable models; Tier-1 keeps OpenAI/Anthropic/Google only; Flagship hides budget variants.How to choose: Enable only chips for capabilities you actually need NOW. Adding "Vision" because you might want it eventually pollutes results with multimodal premium pricing.Read the full guide →
Tags:
Max input $/M:Hard cap on input token price. Models pricier than this are hidden completely.How to choose: Set to ~1.5× the price tier you can defend to finance. For a $5/M budget, start at $7.50 — gives buffer to see slightly-pricier "stretch" options.Read the full guide → any
Min context:Hides models with context windows below your threshold.How to choose: Set to 1.2× your worst-case prompt size. RAG with 5K retrieval → 50K-100K minimum. Long-document analysis → 200K+. Don't overshoot — bigger contexts often cost more per 1M tokens.Read the full guide → any
Compare: Check boxes in the table to add up to 3 models for side-by-side comparison.
📈 RESULTS
Model Provider Input $/M Output $/M Cached $/M Batch Context Modalities Tags
📋 What now?
Need help choosing or cutting your AI bill? 💼 Talk to a CloudIntelligence advisor →
Now that you have your shortlist…

What this means + what to do next

💡 What to consider beyond this shortlist for full TCO
  • Real workload cost — sticker price tells you nothing about your token shape
  • Quality at YOUR task — same model is great on one workload, weak on another
  • Cache-aware effective pricing — caching changes the effective $/M dramatically for repeat-context workloads
  • Vendor lock-in cost — switching prompts between vendors often takes weeks of eval
Rule of thumb: Treat this as discovery only. Real selection requires running 2-3 shortlisted models against an eval set on YOUR task, then costing them at YOUR token shape.
Quantify the hidden costs:
$ How this fits your overall ROI

This is a discovery tool. ROI conversations happen downstream:

  • Is the cheapest model in the shortlist quality-acceptable for my task? (Run eval.)
  • Is the most-capable model in my shortlist 2× better, or 10× better? (The price gap usually reflects 2×, the value gap often reflects 1.3×.)
  • How locked in am I to my current vendor — what would migration cost?
Bridge to ROI:
  • Mixed workloads benefit from routing — different models for different query types Multi Model Router
  • For some open-weight models, self-hosting becomes cheaper above a usage threshold Self Host Breakeven
  • Translate $/request into $/customer or $/feature margin Margin Calculator
Doing something different?

If you already have specific candidates in mind, skip discovery:

  • You know which model you want; you just need the dollar number Cost Calculator
  • You want the cheapest model meeting a quality floor for a specific workload Cheapest Model
  • You've decided you want multiple models for different traffic patterns Multi Model Router

Go deeper

Our playbooks on cutting this number.

🧮
AI Cost Calculator
Calculate exact monthly cost for your workload
📉
Token Volatility
Hedge your AI unit costs
💾
Prompt Caching
The 50-90% discount most teams miss
🤖
Agentic Migration
Agent costs grow non-linearly

The calculator's an estimate. Want the real number?

A 5-day Quickscan ($1,500) reviews your actual usage across every pillar — financial, reliability, governance, privacy, MLOps, observability — and returns a concrete savings plan.

Book a Quickscan →
📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-05.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-05
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 578 days captured
Anthropic Docs
2026-06-05
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 578 days captured
OpenAI
2026-06-05
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 579 days captured
Google AI
2026-06-05
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 554 days captured
Google Vertex
2026-06-05
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 554 days captured
DeepSeek
2026-06-05
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 493 days captured
xAI
2026-06-05
https://x.ai/api
Daily snapshot since Nov 2024 · 411 days captured
Mistral
2026-06-05
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 552 days captured
Cohere
2026-06-05
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 578 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →