Guides → Playground & Guide → TCO Complete - 7-Step Procurement-Grade Wizard for AI Workloads
Meet Marcus Chen. Director of FP&A preparing a multi-year AI procurement case. "I need a TCO model the procurement committee will accept - workload-specific, with sensitivity analysis and a defensible NPV."
🔥 Engineering hands me a $30K/mo number. Procurement asks for a 36-month TCO with NPV, payback, and risk-adjusted scenarios. Wide gap.
Quick gives a board-ready number. Complete gives a contract-ready model. TCO Quick estimates total cost in 90 seconds with 5 inputs - perfect for board updates. TCO Complete is the procurement-grade version: 7 steps, calc handoffs, sensitivity analysis, NPV/IRR/Payback, persona-tuned executive synthesis. Use it when the contract is real.
Marcus's case: $30K/mo inference grows to $85K loaded TCO (Quick). But procurement wants 36-month NPV, best/expected/worst trajectories, and a defensible production-readiness uplift across security + compliance + observability. Quick can't get there. Complete can. The wizard composes the existing pure calcs (inference economics, agentic workflow, RAG pipeline, etc.) into a single procurement document.
7 steps, each feeding the next. (1) Context - workload + vertical + cloud + persona. (2) Inference economics - model + scale + caching + batching. (3) Capability stack - composable: retrieval, voice, agentic, fine-tuning, multimodal, evaluation. (4) Scale + finance trajectory - growth model + budget cap + NPV/IRR/Payback. (5) 6-pillar uplift - security + compliance + observability + PII + HITL + cost controls. (6) ROI sensitivity - tornado chart across volume, growth, pricing, HITL, PII, compliance. (7) Synthesis - persona-tuned executive plan (CFO/CTO/Founder/PM).
Build a defensible AI TCO model in 15 minutes. Workload × vertical × cloud aware, 6-pillar uplift, NPV/IRR/Payback, persona-tuned executive plan.
Below: live sliders. Move them to see numbers change in real time.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →The 7-step output is a procurement document, not a number. You get a multi-page executive plan, NPV/IRR/Payback, sensitivity tornado, and 6-pillar uplift breakdown.
Persona-tuned synthesis. CFO sees finance front-and-center (NPV, payback, scenarios). CTO sees tech debt and capability mix. Founder sees runway impact and risk premium. PM sees roadmap dependencies.
Calc handoffs are first-class. Step 2 calls the inference economics calc. Step 3 composes 6 capability calcs (retrieval, voice, agentic, fine-tuning, multimodal, evaluation). Step 4 invokes the trajectory + NPV engines. No double-counting. Each step's output becomes the next step's input.
ToolsInfo drill-downs are linked. Each pillar in the 6-pillar uplift links to ToolsInfo for vendor selection. You pick the vendor; we model the cost.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
Mid-stage SaaS adding AI to existing product. $250K capex + $30K/mo inference at start, growing 5%/mo. $50K/mo benefit (cost displacement + revenue lift). 12% discount rate. NPV at 36mo should clear $500K with payback ~15mo.
Healthy range: NPV positive at 36mo, payback <18mo
Pre-revenue startup. Higher discount rate (18%) reflects WACC. Shorter horizon (24mo) reflects pivot risk. Lower investment but tighter benefit assumptions. Use this to stress-test.
Healthy range: Payback within horizon
Mature enterprise. 60mo horizon. Lower growth rate (3%/mo) because mature scale. Lower discount rate (8% WACC). Heavy capex, large benefit stream. NPV should clear $5M.
Healthy range: NPV >$5M, IRR >25%
Cost isn't the only dimension. Click any constraint — see how recommendations change.
The wizard's job is to bridge engineering's monthly inference number with procurement's multi-year defensible NPV. Both are correct at their scope; the gap is what TCO Complete fills.
Hallucination is a quality + risk concern. The wizard sizes the budget for evaluation infrastructure (Step 3 Evaluation capability) and for human-in-the-loop review (Step 5 HITL pillar). Both are cost lines.
Step 5's compliance pillar auto-enables based on industry context. SOC 2 ~10%, HIPAA + SOC 2 ~25%, PCI + HIPAA + SOC 2 ~40%.
Privacy is a TCO line item, not an afterthought. The PII pillar models it explicitly.
Latency tooling shows up as observability pillar in Step 5 and possibly edge-deployment in Step 3.
Step 6's pricing sensitivity sweep quantifies the cost of vendor price hikes. Multi-vendor capability buys insurance against this.
Eval pipeline, drift monitoring, prompt versioning, A/B testing - all show up as line items. The wizard prevents double-counting by structuring each capability separately.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
Healthcare AI for documentation automation. HIPAA + audit + SOC 2 push 6-pillar uplift to ~40% of baseline. Step 5 makes this visible. Procurement committee needs the line-item breakdown.
Healthy range: NPV positive with 6-pillar uplift fully on
Fintech application processing transactions + PII. PII redaction tooling, key management, audit logging are significant TCO lines. ROI sensitivity tornado will show PII volume as a top-3 driver.
Healthy range: PII pillar drives 15-25% of total uplift
Agency white-labeling AI to clients. Multi-tenant. Higher growth rate (client acquisition), shorter horizon, moderate capex. Cost-controls pillar dominates (per-client metering).
Healthy range: Payback <12mo
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Cost Calculator for inference detail. Concentration Risk for explicit risk modeling. TCO Quick if you need a 90-second board number.
5 questions, 90 seconds. Use when contract isn't yet on the table.
Self-host vs API economics →Inform Step 3 inference economics decision.
Outcome-priced vendor vs build →Inform Step 1 workload framing.
12-month detailed forecast →Step 4 alternative for shorter horizons.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →