Guides → Playground & Guide → Find Your Calculator - Which Tool Fits Your Question
Meet Anyone first time visiting. Trying to figure out what to use. "There are 26 calculators. Which one matches what I'm actually asking?"
🔥 Wasted 15 minutes in the wrong calc. Bounced.
Most cost analysis fails at the first step: picking the wrong frame. 'How much will AI cost' is too broad to answer. 'How much will Claude cost for 10K customer support tickets/day' is answerable in 30 seconds with the right calc.
Think of these calcs in three layers. (1) Foundation - what does AI cost period (cost-calculator, scale-projection, annual-cost-forecaster). (2) Decisions - A vs B framing (RAG vs FT, batch vs realtime, self-host break-even). (3) Specialized - domain math (vision, audio, RAG pipeline, fine-tuning, vector DB).
Match your question to one of three patterns. 'How much will [thing] cost?' → Foundation calc. 'Which is better, A or B?' → Decision calc. 'What's the cost specifically for [domain]?' → Specialized calc. Most queries fit cleanly into one bucket.
26 calculators, one decision tree. Pick the right one based on your role, your spend, your AI maturity. Don't waste 20 minutes in the wrong calc.
subscription
Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →Maturity 1-2 (early): Start with /tools/cost-calculator. Baseline what AI costs at your usage. Don't optimize yet.
Maturity 3 (optimizing): /tools/multi-model-router for routing, /tools/prompt-cache-roi for caching, /tools/token-reduction-analyzer for compression.
Maturity 4 (portfolio): /tools/vendor-concentration-risk for risk, /tools/scale-projection for growth, /tools/annual-cost-forecaster for budgeting.
Maturity 5 (strategic): /tools/self-host-breakeven, /tools/rag-vs-fine-tuning, /tools/agentic-ai-stack for architecture decisions.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
Pre-launch chatbot. Want to estimate. Use /tools/cost-calculator (full bill projection) and /tools/token-estimator (per-conversation math). 5 minutes total.
Healthy range: Start: cost-calculator + token-estimator
Mid-stage RAG. Want to optimize. /tools/rag-pipeline (full architecture), /tools/chunking-optimizer (chunk strategy), /tools/prompt-cache-roi (caching). 3 calcs, 15 minutes, ~$3K/mo savings.
Healthy range: rag-pipeline + chunking-optimizer + prompt-cache-roi
$60K/mo bill. Risk + growth + optimization. /tools/vendor-concentration-risk (multi-vendor strategy), /tools/scale-projection (10× growth), /tools/multi-model-router (cut bill 40%).
Healthy range: vendor-concentration-risk + scale-projection + multi-model-router
Building voice product. /tools/voice-agent-stack (full architecture cost), /tools/audio-cost (STT/TTS/voice-agent line items). 10 minutes, accurate budget.
Healthy range: audio-cost + voice-agent-stack
Cost isn't the only dimension. Click any constraint — see how recommendations change.
Don't skip levels. Foundation → Decision → Specialized. Trying to use a specialized calc without baseline math is like optimizing the wrong loop.
Calcs estimate cost, not quality. For quality decisions, run an eval set. Calcs help you afford the right model; evals tell you if it's the right model.
Calcs don't audit compliance. They model cost given a chosen vendor. Vendor compliance is a separate workstream.
Privacy is governance + contracts. Calcs help size, not decide.
Latency-aware cost calcs are on the roadmap. For now, measure latency separately.
Lock-in is a portfolio-level concern. The risk calc gives you a number; the strategy is yours.
Calcs don't account for your team's ML/SRE capacity. Self-host calc says 'cheaper at scale' - only true if you can operate it. Honest self-assessment matters.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
Strategic question, not tactical. Annual forecaster for board deck. Concentration risk for vendor strategy. Skip per-calc-task tools.
Healthy range: annual-cost-forecaster + vendor-concentration-risk
Stack three optimization levers. Each cuts independently. Total typically 35-50%.
Healthy range: Stack: token-reduction + prompt-cache + multi-model-router
Pre-build. Estimate cost, pick model, defend budget. 3 calcs, ~10 min, defensible spec.
Healthy range: cost-calculator + token-estimator + cheapest-model
B2C tier decision. Subscription picker compares ChatGPT/Claude/Cursor for personal use. (Coming Phase 6.)
Healthy range: subscription-picker + free-tier-checker (when shipped)
Finance perspective. Forecast + FX exposure + growth scenarios. Defensible budget submission.
Healthy range: annual-cost-forecaster + currency-converter + scale-projection
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Browse the full guide library for any specific calc not surfaced here.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →