Guides → Playground & Guide → TCO Quick - 5-Question Wizard for AI Total Cost of Ownership
Meet Yara Hassan. VP Operations preparing a board update. "Board asks 'what's the total cost of our AI initiative?' I need TCO, not just inference cost."
🔥 Engineering says $30K/mo. Reality is closer to $80K loaded. Need a defensible number.
True AI TCO is 3-4× the inference bill. Inference is what the engineering team sees on invoices. TCO adds: vendor management, ML/SRE headcount, eval pipelines, observability tools, security review, contract negotiation, retraining, vendor concentration risk premium.
Yara's case: $30K/mo Anthropic + $5K vector DB + $2K observability + $40K loaded ML/SRE time + $3K vendor management + $5K eval/retraining = $85K/mo TCO. The board number isn't $30K - it's $85K. That's the truth-telling exercise.
5 questions to compute your TCO. (1) Monthly inference + tools + storage. (2) Headcount allocated to AI (FTE × loaded cost). (3) Vendor management + procurement time. (4) Eval + retraining + drift monitoring. (5) Vendor concentration risk premium (for shock-resilience).
Total cost of AI ownership in 5 questions. Inference + ops + tooling + headcount + risk. CFO-ready estimate in 2 minutes.
tco
Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →TCO = inference + headcount + management + eval + risk premium.
Yara's stack: $30K + $40.5K (1.5 FTE × $27K) + $3K + $5K + $3.4K (5% of inference as risk premium) = ~$82K/mo TCO. Round to $85K for board reporting.
Most teams underestimate TCO by 2-4×. Engineering reports inference; CFO needs the full number. Use this calc to bridge the gap.
The risk premium is real. If your top vendor stumbles (price hike, outage), the cost to absorb or migrate is 5-10% of your inference spend on average. Treat it as insurance you've self-funded.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
Early-stage startup. Light overhead. $5K inference + $6K headcount + $0.5K mgmt + $0.5K eval + $0.25K risk = $13K. Hidden cost is your engineering time.
Healthy range: TCO ~$13K (~2.6× inference)
Yara's case. $30K inference becomes $82K loaded TCO. The board number is $82K, not $30K.
Healthy range: TCO $82K (~2.7× inference)
Mature scaled platform. Higher absolute headcount but better economy of scale. $250K inference → $470K TCO. Multiplier drops below 2× at this scale (efficiency).
Healthy range: TCO ~$470K (~1.9× inference)
Cost isn't the only dimension. Click any constraint — see how recommendations change.
Inference is the most visible cost but rarely the largest line. Headcount + risk premium together usually exceed inference. Report honestly.
TCO models money flow. Quality requires independent measurement.
Compliance is a real cost line. SOC 2, HIPAA, GDPR each contribute audit + monitoring overhead.
Privacy is mostly a process cost (governance) plus light infra (vector DB encryption, audit logs).
Latency engineering is part of the headcount line.
If you can't or won't invest in multi-vendor capability, the risk premium is your insurance against being stuck. Make it explicit.
Eval, retraining, observability - all in the calc. Avoid double-counting by clearly defining what's in each line.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
Healthcare or financial services. Heavy compliance + audit + governance. TCO multiplier is 3.8× because of regulatory overhead. Compliance is a tax.
Healthy range: TCO $151K (~3.8× inference)
Post-launch product. Inference dominates at this scale. Headcount overhead amortizes well. ~2× multiplier.
Healthy range: TCO ~$154K (~1.9× inference)
Agency / consultancy. Light platform overhead. Headcount is shared across clients. ~3× multiplier.
Healthy range: TCO ~$23K (~2.9× inference)
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Cost Calculator for inference detail. Concentration Risk for risk premium math.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →