Guides → Playground & Guide → Pricing History Explainer - Why AI Pricing Moved (and What It Means)
Meet Cassandra Romero. FinOps Manager negotiating an enterprise renewal. "Vendor says they've been raising costs. The data shows they cut prices 3 times. How do I use history in negotiation?"
🔥 Need ammunition for vendor renewal call next week.
AI pricing has been dropping ~30-50% per year on equivalent quality. 2023: GPT-4 launched at $30/1M output. 2024: GPT-4o at $15. 2025: GPT-5 at $10. 2026: GPT-5 reduced to $8. Anthropic similar arc with Claude. The pattern: new model launches at premium, old model gets price cut, 6-12 months later ANOTHER price cut as competition forces it.
Cassandra's renewal: vendor says costs rising. Reality: identical-quality model cost dropped 35% in 18 months. She uses pricing-history.csv data to anchor the conversation: 'your published rate dropped 35% - our enterprise rate should reflect that.' Often gets 15-25% off renewal terms.
Three patterns in pricing history. (1) Launch premium decay - new top model is 1.5-3× the previous flagship; settles to ~1.2× within 6 months. (2) Tier compression - last year's cheap is this year's mid; last year's premium is this year's balanced. (3) Cross-vendor competitive pressure - DeepSeek's aggressive pricing forced Western vendors to drop prices throughout 2025.
Why are AI prices dropping? Which vendor cut what, when, and why? Three years of pricing-history data narrated for FinOps and procurement.
renewal
Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →Negotiation anchor: Published list dropped X%. Your enterprise rate should reflect at least 60-80% of that movement.
Cassandra's case: $50K/mo, 12 months in, 25% list drop. Target renewal discount: 18% off current. Annual savings: $108K.
The data is your friend. Vendors hope you don't track public pricing. If you walk into renewal with a 24-month price chart, you're a different customer than someone who arrives empty-handed.
Don't expect 100% of list movement. Enterprise rates already include some volume discount. The published drop applies on top, but vendors will negotiate hard on the gap.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
Mid-contract, modest list drop. 7% renewal discount realistic. Doesn't move the needle much but every percent counts.
Healthy range: Save $1.7K/mo, $20K/yr
Annual renewal coming up. Clear list drop. 18% discount target with proper preparation. $108K/yr savings = real money for FinOps.
Healthy range: Save $9K/mo, $108K/yr
Big enterprise renewal. 45% list drop. Aggressive 30% discount target. At this scale, vendor will negotiate hard, but data is on your side.
Healthy range: Save $60K/mo, $720K/yr
Cost isn't the only dimension. Click any constraint — see how recommendations change.
Negotiation leverage comes from facts. Historical pricing is public, free, and decisive. Walk in prepared.
Pricing history is verifiable fact, not subjective. Use it confidently.
Pricing-history data may show vendor's cheap-tier launches. Verify those tiers have your required compliance before considering migration.
When list price drops via new tier launch, the new tier may have different privacy posture. Don't auto-migrate.
Cheaper tiers often have aggressive rate limits at first. Test before bulk migration.
Vendors offer better rates for 2-3 year commitments. Trade-off: lock-in risk vs additional savings. Math is workload-dependent.
Don't migrate to a cheaper tier without eval. Cheaper sometimes means lower quality. Verify.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
Annual renewal in 30 days. Time to pull historical pricing data. Build a 1-page chart showing list-price drops. Walk in with target discount + alternative-vendor pricing as BATNA.
Healthy range: Build the case 60 days early
Mid-contract. List dropped 30% - significant. Some vendors will amend mid-contract for relationship reasons. Worth asking. Target 10-15% off remaining term.
Healthy range: Mid-contract amendment possible
No published price drop. Negotiate volume tiers, multi-year discount, payment terms (net-60 vs net-30), feature inclusions. ~5% effective improvement possible.
Healthy range: Negotiate on terms, not price
Multi-vendor RFP. Competing bids from Anthropic + Google + OpenAI + Azure. Each tries to win you. List drops + competition = 25%+ discount achievable.
Healthy range: Competing bid drives pricing
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Pricing Watch for ongoing monitoring. Concentration Risk for negotiation leverage.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →