Guides → Playground & Guide → Pricing Watch - Catch AI Vendor Price Changes Before They Hit Your Bill
Meet Reza Khalili. FinOps Lead at a 500-person enterprise. "Vendor X dropped output prices 40% in March. Took us 6 weeks to notice. How do I catch this earlier?"
🔥 Missed $30K/mo of savings because nobody monitored pricing pages.
AI vendor pricing is more volatile than most teams realize. 2024-2025 saw 8 major price drops across major vendors (DeepSeek, Mistral, Google, Anthropic, OpenAI), 3 price increases, 5 model deprecations, and dozens of smaller tier shifts. Most companies catch these monthly-at-best from invoices.
Reza's team missed Anthropic's Sonnet price drop in March 2025 (output went from $15 to $10 per 1M, a 33% reduction). Their bill stayed where it was for 6 weeks until ops ran a quarterly price review. $30K/mo of unrealized savings - money on the table because nobody was watching.
Three monitoring strategies. (1) Manual quarterly review (most teams) - too slow. (2) Automated price diff (aicost.ai pricing-history feed, vendor RSS) - surfaces changes within 24-48 hours. (3) Negotiated MFN clauses (most-favored-nation in enterprise contracts) - vendor obligated to give you their lowest published price. Best for $100K+/mo accounts.
AI vendor prices change quarterly. Pricing watch surfaces drops, spikes, and deprecations across 12 vendors before they affect your invoice.
pricing_watch
Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →Missed-savings exposure scales with cadence. Weekly review catches changes ~4 days late. Monthly: ~17 days. Quarterly: ~50 days. Each day late = lost daily savings.
Reza's case (quarterly review): $75K bill × 12% beneficial drop × 50 days late / 30 = ~$15K of unrealized savings per cycle.
Automated monitoring breaks the curve. aicost.ai pricing watch surfaces changes within 24-48hr. Captures 95%+ of beneficial savings vs ~30% on quarterly review.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
$2K bill. Monthly review catches changes ~17 days late. Missed savings ~$110/year - small enough that automation overhead isn't justified. Spreadsheet review monthly is fine.
Healthy range: Missed savings ~$130/year - acceptable
Reza upgrades to weekly automated. Catches changes within 4 days. Missed savings drops from $15K/quarter to $300/quarter. Net: $14K+ recovered per quarter.
Healthy range: Missed savings $1.5K/year - much better
$500K/mo enterprise. Daily automated monitoring + MFN clause in enterprise contract. Vendor must give you lowest published price within 30 days of change. Captures essentially all beneficial pricing.
Healthy range: Missed savings <$1K/year - captures everything
Cost isn't the only dimension. Click any constraint — see how recommendations change.
Don't pay for pricing monitoring services that just aggregate public data. aicost.ai pricing watch + vendor RSS feeds covers 95% free.
N/A - pricing is verifiable fact, not LLM output.
When vendor adds a cheaper tier, verify it has the same compliance certifications before routing sensitive workloads to it.
Same as compliance - cheaper tiers sometimes have weaker default privacy. Read terms before routing.
Cheap tier launches often have aggressive rate limits at first. Don't migrate full workload day-1. Ramp gradually.
If you can't switch when prices change, you have no leverage. Pricing watch is most useful when paired with multi-vendor capability.
Alerts without action are noise. Build the workflow: pricing alert → eval cheaper tier → A/B test → migrate. Without workflow, alerts get ignored.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
Product team integrating multiple vendors. Catch model deprecations + new model releases. Bi-weekly automated covers the cadence of major announcements.
Healthy range: Bi-weekly automated
Finance perspective. Monthly review for variance reporting. Quarterly use of pricing data in vendor negotiations. Different cadence for different purposes.
Healthy range: Monthly review with quarterly negotiation
$1.2M/mo consumer scale. Pricing changes affect unit economics directly. Automated routing failover when vendor B becomes cheaper than vendor A. AI-cost as live system, not periodic review.
Healthy range: Tri-daily automated + auto-failover
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Concentration Risk for portfolio. Annual vs Monthly for contract structuring.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →