Guides → Playground & Guide → Quarterly Spend Forecaster - Project Q1-Q4 AI Spend with Seasonality
Meet Hannah Kim. Senior FinOps Manager presenting to CFO quarterly. "Q1 was $180K. Q2 is trending higher. What does the rest of the year look like and what's my confidence interval?"
🔥 CFO asks 'will we hit annual budget?' - I keep saying 'probably' because I don't have a real model.
Quarterly forecasting for AI is harder than cloud. Cloud has stable per-unit pricing. AI has price drops every quarter (DeepSeek, Gemini Flash) AND occasional spikes (vendor adjusts pricing schedule). Plus usage growth is non-linear after feature launches.
Hannah's Q1 was $180K. Naive projection: $720K annual. But Q1 had a feature launch (above-trend) and Q3 typically sees holiday slowdown (below-trend). Plus DeepSeek launched a new tier in Q2 (pricing tailwind) and Anthropic increased Sonnet 4.6 by 10% in March (pricing headwind). Real Q2-Q4 projection is closer to $550-650K with confidence interval, not $540K point estimate.
Three forecasting components. (1) Base rate - current quarterly run-rate. (2) Seasonality - typical quarter-over-quarter pattern (B2B SaaS dips in Q3, climbs Q4). (3) Pricing assumption variance - vendor prices change ±20% per year, build that uncertainty into the band.
Model your AI spend across quarters with seasonality, growth rate, and pricing assumption variance. Brief CFO with confidence intervals, not point estimates.
Below: live sliders. Move them to see numbers change in real time.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →Annual point estimate is the base. Sum of 4 projected quarters with seasonality applied. Q1 + Q2 + Q3 + Q4.
Annual confidence interval is what to brief CFO. ±pricing buffer % gives you the band. $620K ± $62K is more honest than '$620K'.
Watch the Q3 dip. Most teams forecast linearly, then look bad in Q3 when usage dips and they over-allocated.
Pricing tailwinds compound. If multi-vendor router routes to cheapest, pricing drops automatically benefit you. Single-vendor lock-in misses these tailwinds.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
Q1 $180K, 10% Q-over-Q growth, 8% Q3 dip. Q2 $198K, Q3 $200K (with dip), Q4 $237K. Annual ~$620K. CFO presentation: '$620K ± $62K based on current pricing assumptions.'
Healthy range: Annual: $620K ± $62K
Post-launch, 30% Q-over-Q growth, no clear seasonality, higher uncertainty (15% buffer). Q1 $80K → Q4 $176K. Annual ~$470K with wider confidence.
Healthy range: Annual: $470K ± $70K
Caching + routing shipped Q1, expecting 10% Q-over-Q decline through year. Q1 $250K → Q4 $182K. Annual ~$850K declining trajectory. CFO loves this.
Healthy range: Annual: $850K ± $68K
Cost isn't the only dimension. Click any constraint — see how recommendations change.
Quarterly forecasts are a starting point. Monthly re-forecast is required for AI workloads - too volatile for set-and-forget.
Don't downgrade tiers just to hit a forecast number. Hallucination cost (support tickets, lost trust) often exceeds the forecast variance.
Compliance-tier features have ringfenced budgets. Don't shave them to make annual numbers - breach risk dwarfs forecast variance.
Enterprise no-train tier doesn't usually drive significant variance. Build into baseline forecast.
Voice/real-time features are vendor-constrained. Acknowledge in forecast - you can't always optimize to cheapest.
Single-vendor forecasts are constrained by that vendor's pricing trajectory. Multi-vendor lets you arbitrage across pricing changes.
Pull MLOps + eval headcount out of feature forecasts and into a separate OPEX line. Easier to defend in CFO meetings.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
Going into annual budget conversation. 12% growth (slightly above standard), 8% Q3 dip. Annual band $880K ± $88K. Defensible because every assumption is documented.
Healthy range: Defensible $850K-$1M annual band
Q1 spiked from launch ($400K). Modeling 5% Q-over-Q normalization (back to baseline). Annual ~$1.5M. CFO question: 'Is the spike permanent?' - that's a usage analytics question, not finance.
Healthy range: Annual: $1.5M ± $180K
Vendor pricing increased 30% in Q2. Model with wider variance buffer (20%). Annual band $1.1M-$1.4M. Plan optimization to shrink the band by Q3.
Healthy range: Wider band: $1.1M-$1.4M
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Annual Cost Forecaster for monthly granularity. Overage Forecaster for mid-month tracking.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →