Guides → Playground & Guide → Scale Projection - What Happens to Your Bill at 10×, 100×?

Scale Projection - What Happens to Your Bill at 10×, 100×?

Meet Diana Park. Head of Product at a 30-person Series A SaaS. "We're at 5K AI requests/day. Board wants to 100× to 500K/day. What does that bill look like - and when does it break?"

🔥 Investor deck assumes linear cost scaling. The CFO knows that's not how it works.

The story

AI cost is not linear at scale. Three things break the line: vendor rate limits force tier upgrades, premium tiers price differently than developer tiers, and at consumer scale you start renegotiating everything.

Diana's product currently does 5K requests/day, costs $400/mo. The naive projection says 100× = $40K/mo. Reality at 500K req/day involves: batching half the workload (50% discount on that half), negotiating volume pricing (15-30% off list), routing easy queries to cheap models (40-60% savings on those), and hitting throughput limits that force multi-vendor failover.

The actual 100× number is usually 30-60× cost - but only if you do the optimization work. If you scale naively without these levers, you hit $40-80K/mo and start questioning whether AI is profitable.

This calc shows the naive linear projection AND the optimized version, so you can see what's possible and what work it takes to get there.

📊 CALCULATOR AT A GLANCE
Scale Projection - What Happens to Your Bill at 10×, 100×? full size

🎛 Inputs you control

Each input shapes the cost. Click an input on the calculator to set it — explanations below match the live calculator field by field.

Primary model — The model whose per-token price drives the projection.
How to choose: Choose your production model; the chart can overlay cheaper alternatives.
Workload preset — A starting template that fills typical token + volume values.
How to choose: Pick the closest preset then fine-tune, or choose Custom.
Input tokens/request — Average prompt size per request, in tokens.
How to choose: About 750 words is ~1,000 tokens; include retrieved context.
Output tokens/request — Average completion size per request, in tokens.
How to choose: Measure real responses; output is usually priced higher than input.
Current requests/day — Today’s daily request volume — the baseline the projection scales from.
How to choose: Use real traffic; the slider multiplies this up to the scale factor.
Scale factor — How many times current usage you want to project to.
How to choose: Set to your growth target (e.g. 10x); watch for inflection points where unit cost changes.

About this calculator: Scale Projection - What Happens to Your Bill at 10×, 100×?

Most AI bills aren't linear at scale. Find the cliffs - rate limits, tier jumps, latency walls - before they find you. Live pricing, real benchmarks, vendor stress-test.

Inputs you control

Input Impact on result Range Typical
Current monthly AI spend ($) What you're spending right now. Pull from your invoice. If you're pre-launch, use Cost Calculator first. 50 – 50K 400
Target scale (× current) How many times your current usage. 10× = next-stage product growth. 100× = enterprise customer + general-availability launch. 1000× = the consumer-scale dream. 2 – 1K 100
Optimization stage (0-3) 0 = naive scaling, no optimization. 1 = batching where possible. 2 = + multi-model routing. 3 = + volume discounts + caching + token reduction. Most teams stop at 1; the spread between 1 and 3 is enormous. 0 – 3 0

Outputs computed for you · model: scale_projection

Output How inputs affect it
Monthly cost ($) computed from inputs
Annual cost ($) monthlyUsd × 12

Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.

What you're looking at

Each input shapes your cost. Move the slider — see the impact.

400

What you're spending right now. Pull from your invoice. If you're pre-launch, use Cost Calculator first.

Estimated:
100

How many times your current usage. 10× = next-stage product growth. 100× = enterprise customer + general-availability launch. 1000× = the consumer-scale dream.

Estimated:
0

0 = naive scaling, no optimization. 1 = batching where possible. 2 = + multi-model routing. 3 = + volume discounts + caching + token reduction. Most teams stop at 1; the spread between 1 and 3 is enormous.

Estimated:

Ready to run the numbers?

Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.

🚀 Open the full calculator →

Reading your result

Linear case is the upper bound. If you do nothing differently - same model, same prompts, same tier - you'll hit the linear projection. Most teams don't, but it's the worst-case planning anchor.

Optimized case is the floor. With every lever pulled (batching + routing + caching + volume discount), most workloads land 50-70% below linear. Your CFO should see both lines on the same chart.

Watch for cliffs. Some vendors have abrupt tier jumps (e.g., move from $5/$15 'developer' tier to $7/$20 'enterprise' at 10M tokens/month). Others have rate-limit walls that force secondary vendors. The calc shows where these hit.

Read the breakeven ratio. Optimized monthly cost ÷ revenue at 100× usage. If this is <5%, AI is not your cost problem. If >25%, you have margin compression incoming.

What "good" looks like:
  • Naive 100× scaling: usually $30-50K from a $400 base
  • Stage 1 (batching): 60-75% of naive
  • Stage 2 (+ routing): 35-55% of naive
  • Stage 3 (+ caching + volume): 20-35% of naive - the ambitious target

Vendors with the best volume economics

Verified 20 hours ago
  1. 1
    GPT-5 Mini
    $0.250 in · $2.00 out ·
  2. 2
    Command
    $1.00 in · $2.00 out ·
  3. 3
    devstral-2
    $0.400 in · $2.00 out ·

Three real scenarios

Same calculator, three different team sizes. Click a tab to see how the numbers shift.

$2,000 / month ≈ $24,000 / year

Near-term growth - usually no time for optimization. Plan for ~5-6× linear cost, since you may add complexity (more features = more tokens per request) but also start small caching wins.

Healthy range: $1,800-2,500/mo (modest optimization expected)

See inputs used
currentMonthlyUsd
400
scaleFactor
5
modelTier
balanced
optimizationStage
0

Trade-offs

Cost isn't the only dimension. Click any constraint — see how recommendations change.

What matters most to you? Click any dimension — recommendations update.

Best fit for "cost":

  1. Multi-vendor routing 40-60% savings at scale
  2. Batch processing 50% off non-realtime workloads
  3. Prompt caching 30-50% off input-heavy bills

At 100× scale, optimization isn't optional - it's a margin lever. Teams that scale without optimizing typically see AI costs become the largest line item by month 6 of consumer launch.

Use cases

Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.

$50,000 / month ≈ $600,000 / year

Going from internal beta ($100/mo) to consumer scale (1000×). Mandatory: cheap-tier models with verification, multi-vendor routing, aggressive caching, batch overnight workloads, 30-40% volume discount negotiated. Linear would be $100K/mo; optimized usually $20-30K.

Healthy range: $15-30K/mo (vs $100K naive)

See inputs used
currentMonthlyUsd
100
scaleFactor
1,000
modelTier
cheap
optimizationStage
3

What this calculator can't tell you

Honest limitations — every model is wrong; some are useful. Where this one falls short:

For these, use: Prompt Cache ROI for caching math. Multi-Model Router for routing savings. Batch vs Realtime for batch.

Where to go next

12-month projection with breach alerts →

Month-by-month forecast given growth + optimization timeline.

Single-vendor exposure at scale →

What's your blast radius if your primary vendor changes pricing 50%?

Full TCO with optimization roadmap →

7-step wizard including optimization headcount + timeline.

Methodology

Source
/ai-cost-economics
Extraction
Optimization stages calibrated against 12 case studies (anonymized) of teams scaling from $500/mo to $50K+/mo.
Editorial gate
8-layer defense — see aicost.ai/ai-cost-economics
Last verified
6/4/2026, 8:00:00 PM

Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.

3 years of pricing history

Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.

View 3-year history for →
📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-05.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-05
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 578 days captured
Anthropic Docs
2026-06-05
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 578 days captured
OpenAI
2026-06-05
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 579 days captured
Google AI
2026-06-05
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 554 days captured
Google Vertex
2026-06-05
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 554 days captured
DeepSeek
2026-06-05
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 493 days captured
xAI
2026-06-05
https://x.ai/api
Daily snapshot since Nov 2024 · 411 days captured
Mistral
2026-06-05
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 552 days captured
Cohere
2026-06-05
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 578 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →