Guides → Playground & Guide → Vendor Concentration Risk - How Exposed Is Your AI Portfolio?
Meet Diana Sokolov. CTO at a 250-person Series D. "We're 85% on Anthropic. Board asked: what if Anthropic raises prices 50% or has a 6-week outage?"
🔥 No good answer. Need a number + strategy by next quarter board meeting.
Single-vendor AI is one of the highest-leverage risks most companies don't price. If 80%+ of AI spend goes to one vendor, you're exposed to: surprise pricing changes (15-50%), capacity throttling during their outage, model deprecation (vendor sunsets the model you depend on), and contract renegotiation power asymmetry.
Diana's exposure: 85% Anthropic, 10% OpenAI, 5% Google. Single-vendor concentration score: 8/10 (red zone). Mitigation: build LiteLLM-style abstraction now (3-4 weeks), maintain prompt portability (test on Sonnet AND GPT-5.5 weekly), keep at least 20% of workload routed elsewhere as a 'living hedge.'
Three risk dimensions. (1) Pricing risk - how much can vendor raise prices before you must absorb? (2) Operational risk - how long can you survive a vendor outage? (3) Strategic risk - vendor changes terms (no-train tier sunset, model deprecation, geographic restrictions). Each has different mitigation.
Single-vendor AI is a board-level risk. Quantify your concentration, model migration cost, and design the multi-vendor strategy that won't bankrupt you.
risk
Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →Concentration score: top_vendor_share / 10. 85% = 8.5/10 (red). 60% = 6/10 (yellow). 40% = 4/10 (green). Below 30% concentration is rarely worth chasing - diversification cost > marginal risk reduction.
Pricing shock cost = total_spend × top_share × shock_pct. Diana: $100K × 0.85 × 0.25 = $21.25K/mo extra if Anthropic raises prices 25%. Annual $255K. Real money.
Outage exposure = total_spend × top_share × (outage_days / 30). 1-week Anthropic outage: $100K × 0.85 × 7/30 = ~$20K of business at risk (assuming AI is revenue-generating, not just cost).
Mitigation costs are small relative to risk. LiteLLM-style abstraction layer + dual-vendor testing = ~$1-3K/mo overhead. Insurance premium against $250K+ shock. Worth it at most enterprise scales.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
Mid-size SaaS, 50% Anthropic + 30% OpenAI + 20% Google. Multi-vendor abstraction in place. Pricing shock exposure: $10K/mo. Switching time 2 months. Healthy.
Healthy range: Concentration score 5 - green zone
Diana's case. $21K/mo pricing shock exposure. 4 months to migrate. No abstraction. Mitigation: build LiteLLM, route 20% to OpenAI/Google as living hedge, run weekly migration drills. ~$2K/mo cost vs $250K+ risk.
Healthy range: Concentration 8.5 - red. Mitigate now.
$200K/mo, 95% one vendor. 40% price shock = $76K/mo extra. 12 months to switch. Vendor dependency is existential. Mitigation requires ~6 months of focused engineering. Don't defer.
Healthy range: Existential risk - board priority
Cost isn't the only dimension. Click any constraint — see how recommendations change.
Multi-vendor isn't free - it adds complexity. But it gives you negotiating leverage and operational hedging. Net cost is usually neutral or slightly positive.
Anthropic and OpenAI fail differently on edge cases. Routing to multiple gives you implicit ensemble - fewer single-vendor blind spots.
Multi-vendor compliance is more work - separate BAAs, separate audits, separate access controls. But gives you redundancy if one vendor loses certification.
Don't assume Vendor B has same privacy guarantees as Vendor A. Re-verify per vendor. Some have weaker default privacy.
Vendor latencies differ by region and time. Multi-vendor routing can pick the fastest available - turning hedging into a UX win.
Vendor abstraction is solved by libraries. Prompt portability is harder - prompts tuned for Sonnet may not work as-is on GPT-5. Test weekly to keep both viable.
Multi-vendor adds MLOps surface. Vendor health dashboard, per-vendor evals, automated failover. Real investment but pays back at scale.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
10-person startup. $5K bill. 90% one vendor for simplicity. Switch time low (small codebase). Pricing shock tolerable at this scale. Don't over-engineer; revisit at $20K+ bill.
Healthy range: Concentration high but acceptable at this stage
Healthcare, only one vendor offers BAA + HIPAA at scale. Concentration is forced. Mitigation: self-hosted alternative (open-source) as fallback. Doesn't need to be primary, just exists.
Healthy range: Compliance constrains; mitigate via on-prem
$800K/mo. Multi-vendor at this scale isn't optional - capacity, redundancy, negotiating leverage all demand it. 60% top vendor still high. Plan to drop to 40-50% within 12 months.
Healthy range: Multi-vendor strategy mandatory at this scale
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Multi-Model Router for routing layer. Self-Host Break-even for ultimate hedge.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →