Guides → Playground & Guide → AI Margin Calculator - Is Your AI Feature Profitable?
Meet Naomi Bell. Pricing Strategy Lead at a Series B SaaS. "We charge $20/month for the AI feature. Inference cost averages $4/user/month. Is 80% gross margin actually right - or are we missing something?"
🔥 Board asked for AI feature unit economics by next Tuesday.
AI feature pricing is undermodeled. Most teams compute 'cost per request × requests per user × users' and call it good. They miss: heavy users (10× the average), retries, context growth (turn-by-turn token accumulation), prompt caching offsets, and seasonal usage spikes.
Naomi's 'cost = $4/user/month' is an average. But 5% of her users are 'power users' burning $40/month. Average cost looks fine; tail risk is bad. If one of those power users churns, you keep their $20 revenue but lose $40 cost - actually a margin gain. But if you grow them to 20% of base, your blended cost approaches $12/month - margin drops from 80% to 40%.
This calc models the realistic case (with usage distribution, not just average) and surfaces the price point you can charge to maintain target margin.
Each input shapes the cost. Click an input on the calculator to set it — explanations below match the live calculator field by field.
Revenue per AI request vs cost per AI request. Find break-even, gross margin, and the price you can charge. CFO-defensible math for AI feature pricing.
margin
Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →Naive margin vs blended margin. Naive: (revenue − avg cost) / revenue. Blended: weights power users more heavily. Naive overstates margin by 15-30%.
Watch the breakeven user-count. If 1 power user costs more than 5 average users pay, you need 5+ average users per power user to maintain margin. As power-user concentration grows, margin compresses.
Read the price recommendation. Calc back-solves the price needed for target margin. If you want 70% margin and your blended cost is $7/user, price needs to be $23/user - not $20.
Compare to industry benchmarks. Pure SaaS gross margin: 75-85%. AI-native features: 60-75% is realistic, 80%+ requires aggressive optimization. If you're claiming 90% on an AI feature, recheck the math.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
Just-launched, light usage, $10/user revenue against $2.50 blended cost. ~70% margin. Healthy starting point.
Healthy range: 60-70% margin (early-stage)
Naomi's case: $20 revenue, $4 avg cost - naive 80% margin. Blended cost with power users: ~$6.40. Real margin ~62%, not 80%. Still defensible, but not what the board was told. Recommended action: price to $25 or push power-user cost down.
Healthy range: 60-70% blended margin
Scaled feature with 15% power users at 10× cost = blended $14.20 cost vs $30 revenue = 45% margin (after $2 overhead). Below target. Solutions: usage tier pricing (charge power users more), tier downgrade for power users, or hard usage cap.
Healthy range: 30-45% margin - usage-based pricing required
Cost isn't the only dimension. Click any constraint — see how recommendations change.
Margin engineering at scale is where AI features become profitable or die. Optimization isn't optional once revenue exceeds $100K/year on the feature.
Cutting model tier to widen margin is the worst-margin move long-term - quality drop kills retention, drops LTV, makes margin math worse. Optimize via routing/caching, not quality cuts.
Enterprise + HIPAA + SOC 2 tiers cost slightly more, but compliance enables enterprise pricing at 10× revenue. Margin math with compliance is much better than margin math chasing consumer scale.
Privacy tier is essentially free for margin purposes. Don't trade privacy for margin.
Cheaper smaller models tend to be faster. Sometimes you can win on UX (latency) and margin (cost) simultaneously.
Your margin model is fragile to vendor pricing changes. Multi-vendor abstraction (LiteLLM-style) hedges this - when one vendor raises, route to alternative.
Hidden margin compression: drift monitoring, eval pipelines, A/B test infra. Build them in to per-user cost - they're real and they grow with scale.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
Free tier with hope of conversion. Cost should be <$0.50/free user (1% conversion at $50 LTV pays back). Power-user multiplier matters most here - power users in free tier are pure cost. Gate aggressively.
Healthy range: Free-tier cost <$0.50/user (compatible with conversion economics)
Enterprise tier: $200 revenue, $30 avg cost. Blended ~$36 + $10 overhead = $46. Margin 77%. Healthy, but only with usage caps to prevent power-user-heavy enterprise customer from blowing margin.
Healthy range: 75%+ margin with usage cap discipline
Consumer AI ($8/mo subscription). Blended cost ~$3.40 + $0.80 overhead = $4.20. Margin ~47%. Below target. Common consumer AI problem - solved with: tier limits, model routing (cheap for casual users), or freemium gating.
Healthy range: 55-65% margin - typical consumer AI economics
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Budget Planner for annual allocation. Full TCO Wizard for sensitivity analysis.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →