Annual Cost Forecaster · for CFOs and finance teams

Project your AI spend for the next 12 months

Month-by-month forecast with growth assumptions and seasonality. See when you'll breach your budget.

Pricing verified: 2026-06-05 Seasonality patterns built in

📅 Schedule a meeting via AvatarVA ✉️ Email [email protected]

What this calculator does

Project your next 12 months of LLM spend based on current usage + expected growth.

Why use it

Most "AI budget" conversations use last month × 12 — which misses seasonality and growth
Model usage growth at realistic rates: 10% MoM = 213% annual growth, not 120%
Generate a board-ready chart of expected vs worst-case vs best-case

Enter key values

Current monthly spend, expected growth rate, seasonality profile.

Pick a growth scenario

Conservative (5% MoM), expected (10% MoM), aggressive (20% MoM). Or enter custom.

Read 12-month chart

Monthly forecast + cumulative annual cost. See when you cross budget tripwires.

Act on it

If annual forecast > budget, pull savings levers (routing, caching, batch) now. Don't wait.

📊 Calculator at a glance

🎛 CALCULATOR

📊 Baseline + growth assumptions

Current monthly AI spend ($) Take last month's bill as the baseline.

Monthly growth rate 8% 8%/mo compounds to 2.5x/yr. Early-stage: 10-15%. Established: 3-8%. Mature: 0-3%.

Cost trend from vendors

Historically prices decline, but newer flagship models can push average up. Pick your view.

Seasonality pattern

Annual budget ceiling ($) What you've allocated for the next 12 months of AI spend.

Forecast start month

Optimization savings Simulate optimization kick-in.

📈 RESULTS

12-month total forecast

Peak month

Budget status

Breach month

Avg monthly

12-month forecast curve

Monthly cost projection vs budget ceiling (dashed red line).

💡 Recommendations

📅 Month-by-month breakdown

Red rows indicate budget breach (cumulative spend exceeds annual ceiling).

Month	Monthly cost	vs baseline	Cumulative	Budget used

Budget planner → Find the savings → Get a FinOps retainer →

🎯 Use this result to

📈 Project AI runway accurately — Compound growth not linear. Most teams blow through budget by month 8.
🎯 Set a defensible budget ceiling — Concrete number to bring to finance. With math not vibes.
⚠️ Spot the overrun month — See when current trajectory breaks budget. Take action 3 months ahead.
🔌 Integrate with your AI agents — MCP available for agentic workflow integration. Cost-aware budget alerts.

📅 Schedule a call to apply this to your workload

Vendor / Model

Field

Why it’s inferred

Anthropic — Claude Sonnet 4.6

cachedInput

Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.

Anthropic — Claude Sonnet 4.5

cachedInput

Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.

Anthropic — Claude Sonnet 4.5

batchInput

Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Sonnet 4.5

batchOutput

Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Haiku 4.5

cachedInput

Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.

OpenAI — GPT-5.4 Mini

cachedInput

Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.

OpenAI — GPT-5.4 Nano

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Nano

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Nano

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Pro

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.2

cachedInput

Derived at 10% of input; no residency uplift.

OpenAI — GPT-5.2

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2

batchOutput

Derived at 50% of output.

OpenAI — GPT-5

cachedInput

Derived at 10% of input.

OpenAI — GPT-5

batchInput

Derived at 50% of input.

OpenAI — GPT-5

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.5 Pro

cachedInput

Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.

OpenAI — GPT-5.5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.2 Pro

cachedInput

Derived at 10% of input — pro-tier convention.

OpenAI — GPT-5.2 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.1

batchInput

Derived at 50% of input.

OpenAI — GPT-5.1

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Nano

cachedInput

Derived at 10% of input.

OpenAI — GPT-5 Nano

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Nano

batchOutput

Derived at 50% of output.

Google — Gemini 3 Flash

cachedInput

Derived at 10% of input — Google caching discount convention ~90%.

Google — Gemini 3.1 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 3.1 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 3.1 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Pro

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.5 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

cachedInput

Derived at 25% of input per Google 2.0 family caching rates.

Google — Gemini 2.0 Flash

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.0 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

xAI — Grok 4 (legacy)

cachedInput

Extrapolated at 25% of base.

Project your AI spend for the next 12 months

12-month forecast curve

Go deeper

The calculator's an estimate. Want the real number?

Methodology

Primary sources

Inferred values (marked with * in calculator tables)