Methodology

How aicost.ai sources every number in every calculator. Vendor-published prices, independent benchmarks, research papers, and clearly-labeled typical target values with full citations — defended by 8 independent daily audit layers.

Pricing snapshot: 2026-06-05 History depth: 581 days
50
calculators
31
vendors verified
58
cited claims
11
audit layers

Our sourcing categories

Every value in every calculator falls into one of five categories. We label which is which on every page.

  • Vendor-published — directly from the vendor's docs. No asterisk.
  • Published benchmark — independent, reproducible (Chatbot Arena, ANN-Benchmarks, vLLM, etc.).
  • Research paper — peer-reviewed or widely-accepted research.
  • Typical target * — no canonical source. We state the working range and explain why. Override with your own number in the calculator.
  • Computed — arithmetic on other verified numbers. Formulas shown.

Any individual claim may also carry a * modifier if its source has not yet been re-verified against the current vendor page — those claims should be treated as approximate until the next verification cycle resolves them.

Vendor verification freshness

Each vendor's pricing page is re-checked by our automated pipeline on a cadence ranging from daily to weekly. Below: the most-recently verified vendors. Per-calculator detail on the methodology pages.

anthropic
today
chatgpt-plus
today
chroma
today
claude-pro
today
cohere
today
deepseek
today
elevenlabs
today
google
today
mistral
today
openai
today
pinecone
today
qdrant
today

Per-calculator methodology

Each calculator has its own methodology page with every source, every citation, every starred value's range and notes, and the verification dates of the vendors it uses.

Consumer Bill Diagnose
Diagnose an unusually high AI bill. Identify which model, usage pattern, or plan tier drove the spike.
Multi-Model Router
Savings from routing requests across models by complexity.
Prompt Cache ROI
Model the payback of prompt caching given your hit rate.
Batch vs Realtime
When batch processing saves 50% over realtime API.
Token Reduction Analyzer
Savings from prompt compression, distillation, and trimming.
RAG vs Fine-Tuning
When fine-tuning beats RAG economically.
Self-Host Break-even
API vs GPU rental economics with real throughput data.
Context Window Cost
Cost implications of long-context usage across models.
Vendor Concentration Risk
HHI-based analysis of single-vendor dependency on AI spend.
Cost Calculator
Model pricing across the top LLM vendors.
Scale Projection
Project AI cost at future volumes.
Annual Cost Forecaster
Full-year cost forecast with growth and model-price-decline assumptions.
Token Estimator
Estimate token counts for any text using published tokenizer rules.
Budget Planner
Plan monthly AI spend across workloads and vendors.
Margin Calculator
AI cost as a percentage of revenue per user or per call.
AI Model Finder
Browse the complete pricing directory across vendors and capabilities.
Cheapest Model Finder
Filter LLM pricing by capability and budget.
Currency Converter
Convert AI pricing into any major currency.
ROI Quick Check
Hours saved × labor cost vs monthly AI spend. Returns net monthly ROI and payback period.
Cost Plan
Multi-month cost plan with growth assumptions and budget tracking.
API vs Pro/SDK Break-even
When does buying a Pro/SDK subscription break even vs pure pay-per-token API usage?
Subscription Picker for Builders
Multi-step wizard guiding builders through subscription tier selection across vendors.
Agent Loop Cost
Cost of agentic workloads with multiple LLM iterations per task.
Agentic Workflow Cost
Consumer-side estimate for monthly burn on coding agents and autonomous workflows.
Agentic AI Stack
Multi-component agentic stack cost: planner + executor + optional verifier with cache and batch options.
Multimodal RAG Stack
Cost of multimodal RAG: text + image + audio retrieval and generation pipeline.
Voice Agent Stack
Voice agent cost: speech-to-text + LLM + text-to-speech with realistic conversation flows.
RAG Pipeline Cost
End-to-end RAG stack: embedding + vector DB + retrieval + generation.
Chunking Strategy Optimizer
Chunk size and overlap versus retrieval quality and token cost.
Hybrid Search Cost
Vector + BM25 stacked retrieval economics.
Region Cost Map
US vs EU vs APAC AI vendor regional pricing deltas.
Embedding Cost
Compare embedding model pricing across vendors.
Vector DB Cost
Pricing comparison across Pinecone, Qdrant, Weaviate, pgvector, and more.
Fine-Tuning Cost
Fine-tuning training and inference costs.
Vision Cost
Image + vision model pricing.
Audio Cost
Speech-to-text and text-to-speech pricing.
Pricing History
Historical AI pricing trends across vendors and models since 2024.
Subscription Picker
Which AI subscription is right for you? Compares 69 tiers across 12 vendors.
Family Plan Comparator
Cheapest way for your family to have AI: individual plans vs Google One Family vs Microsoft 365 Family + Copilot.
Creator Bundle
What does your creator AI stack cost? Tier-matches image/video/voice/writing volume to the right plans.
Developer Stack Calculator
Which AI dev tools overlap? Find redundant subscriptions and optimize your stack.
Free-Tier-Enough Checker
Do you actually need to pay for AI? Honest answer based on your usage patterns.
Annual vs Monthly
Should you commit to annual billing? Break-even analysis shows when annual actually saves money.
Overage Forecaster
Forecast monthly overage costs on hybrid (seat + usage) AI plans. Account for the "false sense of security" in 2026 hybrid pricing.
Buy vs Build
When does building your own AI stack beat buying an off-the-shelf subscription?
Quarterly Spend Forecaster
Q-over-Q AI spending projections with seasonality and growth.
TCO Quick Estimate
Quick total-cost-of-ownership estimate for an AI initiative, using one of 422 workflow templates.
TCO Complete
Full TCO analysis: model + infrastructure + tooling + labor over multi-year horizon.
Agentic AI Playbook
Opinionated playbook for scoping, building, and costing agentic AI initiatives.
Multimodal RAG Playbook
Opinionated playbook for multimodal RAG stack selection, sizing, and cost.

How we keep this honest — 8 audit layers

Every number is defended by 8 independent automated checks that run every day at 03:30 EDT. If any layer flags an issue, we treat it as stop-the-line work — no new calculator changes ship until the issue is resolved.

  1. Architecture (12 structural invariants)
  2. Smoke test (every calc page renders)
  3. Golden values (math correctness vs reference)
  4. Source resilience (independent reference data sources reachable)
  5. Math gotchas (static code analysis)
  6. Hybrid reconciliation (cross-source agreement)
  7. Drift detection (day-over-day price changes)
  8. Vendor cache (per-vendor freshness wiring)
  9. Cross-vendor reachability (live vendor pricing page probes)
  10. Rendered HTML drift (calc page DOM contracts, 45 pages daily)
  11. Pricing freshness (cron heartbeat + per-vendor age tracking)
📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-05.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-05
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 578 days captured
Anthropic Docs
2026-06-05
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 578 days captured
OpenAI
2026-06-05
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 579 days captured
Google AI
2026-06-05
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 554 days captured
Google Vertex
2026-06-05
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 554 days captured
DeepSeek
2026-06-05
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 493 days captured
xAI
2026-06-05
https://x.ai/api
Daily snapshot since Nov 2024 · 411 days captured
Mistral
2026-06-05
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 552 days captured
Cohere
2026-06-05
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 578 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →