Guides → Playground & Guide → Quarterly Spend Forecaster - Project Q1-Q4 AI Spend with Seasonality

Quarterly Spend Forecaster - Project Q1-Q4 AI Spend with Seasonality

Meet Hannah Kim. Senior FinOps Manager presenting to CFO quarterly. "Q1 was $180K. Q2 is trending higher. What does the rest of the year look like and what's my confidence interval?"

🔥 CFO asks 'will we hit annual budget?' - I keep saying 'probably' because I don't have a real model.

The story

Quarterly forecasting for AI is harder than cloud. Cloud has stable per-unit pricing. AI has price drops every quarter (DeepSeek, Gemini Flash) AND occasional spikes (vendor adjusts pricing schedule). Plus usage growth is non-linear after feature launches.

Hannah's Q1 was $180K. Naive projection: $720K annual. But Q1 had a feature launch (above-trend) and Q3 typically sees holiday slowdown (below-trend). Plus DeepSeek launched a new tier in Q2 (pricing tailwind) and Anthropic increased Sonnet 4.6 by 10% in March (pricing headwind). Real Q2-Q4 projection is closer to $550-650K with confidence interval, not $540K point estimate.

Three forecasting components. (1) Base rate - current quarterly run-rate. (2) Seasonality - typical quarter-over-quarter pattern (B2B SaaS dips in Q3, climbs Q4). (3) Pricing assumption variance - vendor prices change ±20% per year, build that uncertainty into the band.

About this calculator: Quarterly Spend Forecaster - Project Q1-Q4 AI Spend with Seasonality

Model your AI spend across quarters with seasonality, growth rate, and pricing assumption variance. Brief CFO with confidence intervals, not point estimates.

Inputs you control

Input	Impact on result	Range	Typical
Current quarter spend ($)	Your most recent completed quarter's spend. Pull from invoices.	1K – 5M	180000
Quarter-over-quarter growth rate (%)	Expected growth (or decline) per quarter. Account for feature launches, customer growth, optimization shipping.	-50 – 100	10
Seasonal Q3 dip (%)	How much usage drops in your slow quarter (Q3 for B2B, Q1 for many B2C). 0 if no clear seasonality.	0 – 30	8
Pricing variance buffer (%)	Pad for vendor pricing changes. AI vendors change ±10-20% per year. 10% buffer is typical.	0 – 25	10

Outputs computed for you

Output	How inputs affect it
Monthly cost ($)	computed from inputs
Annual cost ($)	monthlyUsd × 12

Below: live sliders. Move them to see numbers change in real time.

What you're looking at

Each input shapes your cost. Move the slider — see the impact.

Current quarter spend ($) 180,000

Your most recent completed quarter's spend. Pull from invoices.

Estimated: —

Quarter-over-quarter growth rate (%) 10

Expected growth (or decline) per quarter. Account for feature launches, customer growth, optimization shipping.

Estimated: —

Seasonal Q3 dip (%) 8

How much usage drops in your slow quarter (Q3 for B2B, Q1 for many B2C). 0 if no clear seasonality.

Estimated: —

Pricing variance buffer (%) 10

Pad for vendor pricing changes. AI vendors change ±10-20% per year. 10% buffer is typical.

Estimated: —

Ready to run the numbers?

Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.

🚀 Open the full calculator →

Reading your result

Annual point estimate is the base. Sum of 4 projected quarters with seasonality applied. Q1 + Q2 + Q3 + Q4.

Annual confidence interval is what to brief CFO. ±pricing buffer % gives you the band. $620K ± $62K is more honest than '$620K'.

Watch the Q3 dip. Most teams forecast linearly, then look bad in Q3 when usage dips and they over-allocated.

Pricing tailwinds compound. If multi-vendor router routes to cheapest, pricing drops automatically benefit you. Single-vendor lock-in misses these tailwinds.

What "good" looks like:

Standard SaaS: 8-12% Q-over-Q growth, 5-10% Q3 dip, ±10% pricing buffer
Hypergrowth: 25%+ Q-over-Q, less seasonality, ±15% buffer (more uncertainty)
Mature/optimizing: 0-5% Q-over-Q (or negative), ±5% buffer
Crisis mode: Negative growth, optimization-driven, recompute monthly

Vendor pricing changes most affecting forecasts

Verified 20 hours ago

1

GPT-5 Mini

$0.250 in · $2.00 out ·
2

Command

$1.00 in · $2.00 out ·
3

devstral-2

$0.400 in · $2.00 out ·

Three real scenarios

Same calculator, three different team sizes. Click a tab to see how the numbers shift.

Q1 $180K, 10% Q-over-Q growth, 8% Q3 dip. Q2 $198K, Q3 $200K (with dip), Q4 $237K. Annual ~$620K. CFO presentation: '$620K ± $62K based on current pricing assumptions.'

Healthy range: Annual: $620K ± $62K

See inputs used

currentQuarterUsd: 180,000
growthRatePct: 10
seasonalityPct: 8
pricingVarianceBufferPct: 10

Trade-offs

Cost isn't the only dimension. Click any constraint — see how recommendations change.

What matters most to you? Click any dimension — recommendations update.

Best fit for "cost":

Build optimization into Q3 plan Cuts Q3-Q4 spend
Re-forecast monthly Catch drift early

Quarterly forecasts are a starting point. Monthly re-forecast is required for AI workloads - too volatile for set-and-forget.

Use cases

Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.

Going into annual budget conversation. 12% growth (slightly above standard), 8% Q3 dip. Annual band $880K ± $88K. Defensible because every assumption is documented.

Healthy range: Defensible $850K-$1M annual band

See inputs used

currentQuarterUsd: 220,000
growthRatePct: 12
seasonalityPct: 8
pricingVarianceBufferPct: 10

What this calculator can't tell you

Honest limitations — every model is wrong; some are useful. Where this one falls short:

Linear growth assumption - non-linear feature launches need step-function modeling.
Pricing variance buffer is ±X - doesn't model asymmetric risk (more downside than upside).
Seasonality is workload-specific - don't blindly apply 'B2B Q3 dip'.
Doesn't model new-feature step-functions launching mid-year.

For these, use: Annual Cost Forecaster for monthly granularity. Overage Forecaster for mid-month tracking.

Where to go next

Drill into monthly granularity →

Once quarterly band is set, model monthly within each quarter.

Allocate quarterly budget across features →

Top-down allocation per quarter.

Stress-test against vendor surprises →

What if a vendor raises 30% mid-quarter?

Methodology

Source: /ai-cost-economics
Extraction: Quarterly forecasting validated against 24 months of aicost.ai usage data.
Editorial gate: 8-layer defense — see aicost.ai/ai-cost-economics
Last verified: 6/4/2026, 8:00:00 PM

Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.

3 years of pricing history

Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.

View 3-year history for →

📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

All prices are USD per 1 million tokens, current as of 2026-06-05.
Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
Batch API discounts are 50% off standard rates across providers that offer Batch mode.
Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
Long-context pricing tiers apply when input exceeds model threshold.
Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic

2026-06-05

https://www.anthropic.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Anthropic Docs

2026-06-05

https://platform.claude.com/docs/en/about-claude/pricing

Daily snapshot since Sep 2023 · 578 days captured

OpenAI

2026-06-05

https://openai.com/api/pricing/

Daily snapshot since Sep 2023 · 579 days captured

Google AI

2026-06-05

https://ai.google.dev/gemini-api/docs/pricing

Daily snapshot since Dec 2023 · 554 days captured

Google Vertex

2026-06-05

https://cloud.google.com/vertex-ai/generative-ai/pricing

Daily snapshot since Dec 2023 · 554 days captured

DeepSeek

2026-06-05

https://api-docs.deepseek.com/quick_start/pricing

Daily snapshot since May 2024 · 493 days captured

xAI

2026-06-05

https://x.ai/api

Daily snapshot since Nov 2024 · 411 days captured

Mistral

2026-06-05

https://mistral.ai/pricing

Daily snapshot since Dec 2023 · 552 days captured

Cohere

2026-06-05

https://cohere.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Voyage AI

2026-06-05

https://docs.voyageai.com/docs/pricing

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →

Quarterly Spend Forecaster - Project Q1-Q4 AI Spend with Seasonality

The story

About this calculator: Quarterly Spend Forecaster - Project Q1-Q4 AI Spend with Seasonality

Inputs you control

Outputs computed for you

What you're looking at

Ready to run the numbers?

Reading your result

Vendor pricing changes most affecting forecasts

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)