You're not measuring AI ROI — you're guessing.
SaaS used to have 80% gross margins. AI integration is dropping that to 60% or lower. CFOs are demanding AI unit economics: cost per response, revenue per user, margin per feature. If you can't produce those numbers, your AI program is flying blind — and the next budget cycle is not going to be kind.
Cost-per-feature attribution + revenue mapping + continuous tracking.
The unit economics stack: (1) Tag every AI API call with feature + customer + user. (2) Join spend data with revenue attribution. (3) Build per-feature P&L dashboards. (4) Set minimum margin thresholds; kill features that don't hit them. The goal: every AI feature has a business case that survives the next QBR.
Layer these together. Each links to the full category guide.
A 4-week Total Assessment gets you a concrete plan.
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →