Guides → Playground & Guide → Scale Projection - What Happens to Your Bill at 10×, 100×?
Meet Diana Park. Head of Product at a 30-person Series A SaaS. "We're at 5K AI requests/day. Board wants to 100× to 500K/day. What does that bill look like - and when does it break?"
🔥 Investor deck assumes linear cost scaling. The CFO knows that's not how it works.
AI cost is not linear at scale. Three things break the line: vendor rate limits force tier upgrades, premium tiers price differently than developer tiers, and at consumer scale you start renegotiating everything.
Diana's product currently does 5K requests/day, costs $400/mo. The naive projection says 100× = $40K/mo. Reality at 500K req/day involves: batching half the workload (50% discount on that half), negotiating volume pricing (15-30% off list), routing easy queries to cheap models (40-60% savings on those), and hitting throughput limits that force multi-vendor failover.
The actual 100× number is usually 30-60× cost - but only if you do the optimization work. If you scale naively without these levers, you hit $40-80K/mo and start questioning whether AI is profitable.
This calc shows the naive linear projection AND the optimized version, so you can see what's possible and what work it takes to get there.
Each input shapes the cost. Click an input on the calculator to set it — explanations below match the live calculator field by field.
Most AI bills aren't linear at scale. Find the cliffs - rate limits, tier jumps, latency walls - before they find you. Live pricing, real benchmarks, vendor stress-test.
scale_projection
Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →Linear case is the upper bound. If you do nothing differently - same model, same prompts, same tier - you'll hit the linear projection. Most teams don't, but it's the worst-case planning anchor.
Optimized case is the floor. With every lever pulled (batching + routing + caching + volume discount), most workloads land 50-70% below linear. Your CFO should see both lines on the same chart.
Watch for cliffs. Some vendors have abrupt tier jumps (e.g., move from $5/$15 'developer' tier to $7/$20 'enterprise' at 10M tokens/month). Others have rate-limit walls that force secondary vendors. The calc shows where these hit.
Read the breakeven ratio. Optimized monthly cost ÷ revenue at 100× usage. If this is <5%, AI is not your cost problem. If >25%, you have margin compression incoming.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
Near-term growth - usually no time for optimization. Plan for ~5-6× linear cost, since you may add complexity (more features = more tokens per request) but also start small caching wins.
Healthy range: $1,800-2,500/mo (modest optimization expected)
10× growth with one round of optimization (batching, basic caching). Realistic target: 70-80% of linear. Diana should plan ~$3K/mo.
Healthy range: $2,500-3,500/mo with batching
Full optimization stack - batch where possible, route by complexity, cache aggressive, negotiate volume. Saves 50-70% vs linear. Requires dedicated optimization headcount (1 SRE + AI engineer time).
Healthy range: $10K-18K/mo (vs $40K naive)
Cost isn't the only dimension. Click any constraint — see how recommendations change.
At 100× scale, optimization isn't optional - it's a margin lever. Teams that scale without optimizing typically see AI costs become the largest line item by month 6 of consumer launch.
At consumer scale you can't afford premium-everywhere. Route by query difficulty: simple lookups → cheap model, complex reasoning → premium. Hallucination rate matters most on the premium-eligible queries.
Geographic expansion forces compliance fragmentation. EU customers may require Mistral or Anthropic-EU. Healthcare requires HIPAA BAA. Plan compliance routing into the architecture, not as an afterthought.
At consumer scale, privacy violations become PR incidents. Enterprise tier across all vendors. Verify in contract.
Latency is invisible at small scale, brutal at large. P99 latency matters more than P50 - one slow request blocks downstream agents. Plan for multi-vendor failover.
Single-vendor dependency at 100× scale = catastrophe risk. One pricing change, one outage, and you're scrambling. Build vendor abstraction early - it's much cheaper than retrofitting at scale.
At consumer scale, you need someone who owns the AI stack: drift monitoring, A/B tests, prompt registry, eval pipeline. Without this, optimization gains erode within 6 months.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
Going from internal beta ($100/mo) to consumer scale (1000×). Mandatory: cheap-tier models with verification, multi-vendor routing, aggressive caching, batch overnight workloads, 30-40% volume discount negotiated. Linear would be $100K/mo; optimized usually $20-30K.
Healthy range: $15-30K/mo (vs $100K naive)
One enterprise contract = 25× current load. Bill goes to ~$40K/mo if linear, $25K with optimization. The math should work into your customer's contract - usage-based pricing or per-seat with reasonable margin.
Healthy range: $25-40K/mo at customer-billed pricing
4× scale (3 regions vs 1). Routing matters here for latency, not just cost - Gemini in Asia, Claude in US, Mistral in EU. Compliance/data residency forces multi-vendor anyway.
Healthy range: $15-22K/mo with regional routing
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Prompt Cache ROI for caching math. Multi-Model Router for routing savings. Batch vs Realtime for batch.
Month-by-month forecast given growth + optimization timeline.
Single-vendor exposure at scale →What's your blast radius if your primary vendor changes pricing 50%?
Full TCO with optimization roadmap →7-step wizard including optimization headcount + timeline.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →