AI API Cost Calculator

What does your AI feature actually cost?

Pick a model. Set your workload. See daily, monthly, and annual cost - with the real optimizations most teams miss.

Pricing verified: 2026-06-05 161 models across 8 providers Caching + batch API applied
What this calculator does

See exactly what an LLM workload will cost across 70+ models. Pick a model, enter your tokens per request and daily volume, get per-request / daily / monthly / annual cost. Caching and Batch API savings calculated automatically.

Why use it
  • Stop guessing — turn "AI is expensive" into a precise monthly number you can defend to finance
  • Compare 70+ models side-by-side at YOUR token shape, not vendor marketing examples
  • Spot the 30-90% savings opportunities (prompt caching, Batch API, model swap) before you ship
  • Re-cost instantly when a vendor changes rates — your numbers stay current
Who uses this:
Vibe Coder High Before committing a model in your weekend project, see what it costs at the volume you expect Small Business High Single source of truth for the AI line-item in your monthly budget — defensible numbers for finance Enterprise High Compare procurement options at your real token shape; export numbers for RFP justification

These are the inputs, outputs, and how you can use this calculator for your AI workloads.

📥 Inputs you provide
  • ModelPick from 70+ AI models
  • Input tokens per requestSize of your prompt
  • Output tokens per requestExpected response size
  • Requests per dayYour daily call volume
  • Prompt cache hit rateHow often your prompt prefix repeats
  • Days per monthWorking days for billing math
📤 Outputs you get
  • Cost per requestDollars per single API call
  • Monthly costDollars per month at your volume
  • Annual costLinear annual projection
  • Input vs output cost splitWhere the money goes
  • Optimization suggestionsHow to cut the bill
🎯 Use your results to
🎯
Pick the right model

Run the same workload through 5 candidates; pick the cheapest that meets your quality bar

📈
Forecast your AI bill

Defensible monthly + annual numbers for your finance team

💾
Quantify savings

Estimated dollars from caching, Batch API, and model swap — before you implement

🔌
Integrate with your AI agents

MCP available for agentic workflow integration — surface live cost intelligence to your agents

👇 Now try the calculator below with your own AI workloads

📊 Calculator at a glance
Cost Calculator full size
🎛 CALCULATOR
Your workload

Estimate conservatively - we'll show you what caching + batch mode save below.

Load a typical workload, then tweak the numbers.
The prompt + system message + context sent to the model. ~4 chars ≈ 1 token.
The model's reply. Usually the bigger line item (5x input rate).
If you reuse the same system prompt, apps typically see 30-50% hit rate. 90% off on cached tokens.
Compare all models →
📈 RESULTS
💰 Your estimated cost
📋 Example Workload - change any field to see your actual cost
Loading…
Monthly cost
-
-
Per request-
Per day-
Input tokens/day-
Output tokens/day-
Input cost share-
Output cost share-
Annual-
Monthly tokens-
📋 What now?
  • Compare models — switch the model dropdown to see the same workload across 70+ options
  • Lock in savings — toggle caching and Batch mode to surface the 30-90% reductions before you ship
  • Set your budget — use the monthly + annual numbers as defensible inputs for finance
Need help cutting your AI bill? 💼 Talk to a CloudIntelligence advisor →
Now that you have your number…

What this means + what to do next

💡 What to consider beyond this number for full TCO
  • Observability + logging (prompts, outputs, latency, errors) — typically adds 5-10% to inference cost at production scale
  • Eval pipelines + benchmark sets — $500-$5K/mo even without continuous evaluation; budget more if quality drift matters
  • Human-in-the-loop review for edge cases — $4K-$12K/mo per FTE reviewer for production AI features
  • Retry / fallback overhead — typically 3-15% on top of base inference depending on error rate and retry logic
  • Vendor lock-in cost — invisible until migration day, often $50K+ in re-prompting + re-eval + downtime risk
Rule of thumb: Multiply this number by 1.5–2.5× for production-ready TCO. Lower end (1.5×) = internal tools with low error tolerance and no compliance overhead. Higher end (2.5×) = customer-facing AI features with eval pipelines, compliance logging, and human review.
Quantify the hidden costs:
  • If your workload is multi-turn (chat, agents, tool-using), costs compound per turn — this baseline misses that Agent Loop Cost
  • Quantifies lock-in cost on the day you need to switch vendors Vendor Concentration Risk
  • If you're adding retrieval, the embedding + vector DB + rerank costs aren't in this baseline Rag Pipeline
$ How this fits your overall ROI

This calculator gives you the cost number. Here's how to turn that into an ROI story:

  • What revenue or cost-saved does this AI feature drive monthly?
  • How long until cumulative AI cost exceeds the value the feature generates?
  • How sensitive is your business to vendor price changes? (Last 12 months saw -50% to +25% swings across major vendors.)
Bridge to ROI:
  • Convert per-request cost into per-customer or per-feature margin Margin Calculator
  • Project 12 months out with growth + price-change assumptions Annual Cost Forecaster
  • See cost at 10× and 100× current usage — the discontinuities matter Scale Projection
Doing something different?

Doing something different? These calculators may fit better:

  • For multi-turn agent loops with tool calls Agent Loop Cost
  • For full RAG over a knowledge base with embeddings + retrieval Rag Pipeline
  • For image / multimodal workloads where pricing differs Vision Cost

Go deeper

Our playbooks on cutting this number.

💾
Prompt Caching
The 50-90% discount most teams miss
📉
Token Volatility
Hedge your AI unit costs
🧮
AI Unit Economics
Is your AI feature profitable?
🔁
Agent Loop Guardrails
Stop $20K overnight bills

The calculator's an estimate. Want the real number?

A 5-day Quickscan ($1,500) reviews your actual usage across every pillar — financial, reliability, governance, privacy, MLOps, observability — and returns a concrete savings plan.

Book a Quickscan →
📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-05.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-05
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 578 days captured
Anthropic Docs
2026-06-05
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 578 days captured
OpenAI
2026-06-05
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 579 days captured
Google AI
2026-06-05
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 554 days captured
Google Vertex
2026-06-05
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 554 days captured
DeepSeek
2026-06-05
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 493 days captured
xAI
2026-06-05
https://x.ai/api
Daily snapshot since Nov 2024 · 411 days captured
Mistral
2026-06-05
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 552 days captured
Cohere
2026-06-05
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 578 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →