Guides → Playground & Guide → Annual AI Cost Forecaster - 12-Month Projection with Breach Alerts
Meet Robert Tanaka. FinOps lead at a 200-person SaaS. "I have a $120K annual AI budget. When do we breach it - month 7 or month 11?"
🔥 Last year's cloud overrun got board-level attention. AI is 5× cloud growth rate.
FinOps for AI is harder than cloud. Cloud has predictable scaling - usage drives cost linearly. AI has growth + price volatility (40-60% per year on flagship models, both directions) + capability churn (every 6 months a new model resets your assumptions).
Robert's $120K annual budget is the line that matters. The question isn't 'will we breach it' - based on 30% MoM growth they will - it's 'when' and 'with what optimization plan'. The 12-month forecast surfaces the breach point and the optimization runway.
This calc projects month-by-month, factors in pricing trends (vendors typically drop 15-30%/year), models growth curves (linear, S-curve, hockey stick), and shows the breach month under each scenario.
Each input shapes the cost. Click an input on the calculator to set it — explanations below match the live calculator field by field.
Project your AI bill month-by-month for 12 months. Surface budget breaches before they happen. Models growth + seasonality + vendor pricing trends.
forecast
Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →The breach month is the headline. If your budget breaches at month 7, you have a hard problem in 7 months. If month 11, you have time to optimize.
Watch the spread between linear and price-adjusted forecasts. If vendor prices drop 20%/year as historical trend suggests, the price-adjusted curve gives you 2-4 more months before breach. Don't bet on it - it's a cushion, not a strategy.
Read the optimization runway. The number of months before breach is your runway to ship optimization (caching, routing, batching, vendor renegotiation). Each lever shifts the breach by 2-4 months. Pull two levers, you're safe for the year.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
Mature SaaS, 5% MoM growth, vendor trends flat or down - fits inside annual budget comfortably with margin.
Healthy range: Breach unlikely in 12mo
$6K/mo at 15% MoM growth - annual run-rate hits $200K+ by EoY. Budget breach around month 8-9. Robert needs to ship caching + routing in Q1-Q2 to make annual.
Healthy range: Breach month 8-10 - optimize now
Consumer launch with 35% MoM growth. Even with vendor price drops, budget breaches in ~5 months. Either raise budget, route to cheap models aggressively, or accept slower feature rollout.
Healthy range: Breach month 5-6 - major optimization required
Cost isn't the only dimension. Click any constraint — see how recommendations change.
Annual commitments cut costs but lock you into a vendor. Reasonable bet at $100K+ AI spend if you're confident in your usage curve. Risky if growth might pivot.
Cutting model quality to fit budget is the worst FinOps outcome. Hallucinations create downstream support cost that exceeds savings. Optimize via routing/caching first; downgrade tier last.
Discovering you need HIPAA in month 8 of forecast = budget shock. Build compliance tier into the baseline if regulated industries are even possible customers.
Enterprise tier is rarely the budget-breaker. Don't sacrifice on privacy.
Latency optimization is a UX investment, not a budget one. Tracks separately in your forecast.
Single-vendor forecasts are fragile. One 50% price hike and your model is wrong. Multi-vendor abstraction (LiteLLM-style) costs ~2 weeks of engineering time and makes your forecast robust to vendor surprises.
The single most effective FinOps move at $50K+ AI spend: hire one AI engineer focused on optimization. Their fully-loaded cost (~$15K/mo) typically returns 3-5× through caching/routing/eval-driven prompt reduction.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
Going to finance with a budget request. Run forecast, show breach month + optimization plan, request budget aligned to growth + 20% buffer. Data-driven asks land better than guesses.
Healthy range: Forecast supports defensible budget ask
At $12K/mo current and 8% growth, you'll hit $300K+ ARR in AI spend. That's volume discount territory - every major vendor will negotiate 15-30% off list at this scale. Use the forecast as ammunition.
Healthy range: Annual run-rate of $200K = volume discount territory
New feature would push monthly spend up 25% MoM. Forecast shows breach in 4 months. Either delay launch, scope to cheap-tier model only, or get budget pre-approval.
Healthy range: Forecast clarifies feasibility
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Scale Projection for non-linear scenarios. Budget Planner for allocation across use cases. Full TCO Wizard for sensitivity analysis.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →