Guides → Playground & Guide → AI Margin Calculator - Is Your AI Feature Profitable?

AI Margin Calculator - Is Your AI Feature Profitable?

Meet Naomi Bell. Pricing Strategy Lead at a Series B SaaS. "We charge $20/month for the AI feature. Inference cost averages $4/user/month. Is 80% gross margin actually right - or are we missing something?"

🔥 Board asked for AI feature unit economics by next Tuesday.

The story

AI feature pricing is undermodeled. Most teams compute 'cost per request × requests per user × users' and call it good. They miss: heavy users (10× the average), retries, context growth (turn-by-turn token accumulation), prompt caching offsets, and seasonal usage spikes.

Naomi's 'cost = $4/user/month' is an average. But 5% of her users are 'power users' burning $40/month. Average cost looks fine; tail risk is bad. If one of those power users churns, you keep their $20 revenue but lose $40 cost - actually a margin gain. But if you grow them to 20% of base, your blended cost approaches $12/month - margin drops from 80% to 40%.

This calc models the realistic case (with usage distribution, not just average) and surfaces the price point you can charge to maintain target margin.

📊 CALCULATOR AT A GLANCE

🚀 Open the full calculator ✉️ Email [email protected]

🎛 Inputs you control

Each input shapes the cost. Click an input on the calculator to set it — explanations below match the live calculator field by field.

▸ Paying users / month — Number of paying users in a month.

How to choose: Use current paying seats/accounts; cost and revenue both scale with this.

▸ Revenue / user / month ($) — Average revenue per paying user per month.

How to choose: Use net price after discounts; this is the top of your margin.

▸ Other COGS per user ($) — Non-AI cost of goods per user — hosting, support, payment fees.

How to choose: Estimate per-user infra + support; exclude sales/marketing (not COGS).

▸ AI requests / user / day — How many model calls an average user triggers daily.

How to choose: Use product analytics; roughly x30 for monthly volume.

▸ Input tokens / req — Average prompt size per request, in tokens.

How to choose: About 750 words is ~1,000 tokens; include system prompt + context.

▸ Output tokens / req — Average completion size per request, in tokens.

How to choose: Measure typical responses; output tokens usually cost more than input.

▸ Model — The model priced into your unit economics.

How to choose: Pick the one you actually serve; the table shows margin at other choices.

▸ Prompt cache hit rate — Share of input tokens served from prompt cache.

How to choose: Higher hit rates cut input cost sharply; 0% if you do not cache.

About this calculator: AI Margin Calculator - Is Your AI Feature Profitable?

Revenue per AI request vs cost per AI request. Find break-even, gross margin, and the price you can charge. CFO-defensible math for AI feature pricing.

Inputs you control

Input	Impact on result	Range	Typical
Revenue per user per month ($)	What you charge per user. For tiered pricing, use the average across paying tiers.	1 – 500	20
Average inference cost per user ($/mo)	Pull from invoice / number of users. Not all users are equal - power users skew this.	0.1 – 100	4
Power users (% of total)	Users who use the feature 5-10× more than average. Most products: 3-7% are power users.	0 – 30	5
Power user cost multiplier (× avg)	How much more power users cost. 5× is conservative, 10× is typical, 15× is heavy.	2 – 20	8

Outputs computed for you · model: `margin`

Output	How inputs affect it
Monthly cost ($)	computed from inputs
Annual cost ($)	monthlyUsd × 12

Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.

What you're looking at

Each input shapes your cost. Move the slider — see the impact.

Revenue per user per month ($) 20

What you charge per user. For tiered pricing, use the average across paying tiers.

Estimated: —

Average inference cost per user ($/mo) 4

Pull from invoice / number of users. Not all users are equal - power users skew this.

Estimated: —

Power users (% of total) 5

Users who use the feature 5-10× more than average. Most products: 3-7% are power users.

Estimated: —

Power user cost multiplier (× avg) 8

How much more power users cost. 5× is conservative, 10× is typical, 15× is heavy.

Estimated: —

Ready to run the numbers?

Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.

🚀 Open the full calculator →

Reading your result

Naive margin vs blended margin. Naive: (revenue − avg cost) / revenue. Blended: weights power users more heavily. Naive overstates margin by 15-30%.

Watch the breakeven user-count. If 1 power user costs more than 5 average users pay, you need 5+ average users per power user to maintain margin. As power-user concentration grows, margin compresses.

Read the price recommendation. Calc back-solves the price needed for target margin. If you want 70% margin and your blended cost is $7/user, price needs to be $23/user - not $20.

Compare to industry benchmarks. Pure SaaS gross margin: 75-85%. AI-native features: 60-75% is realistic, 80%+ requires aggressive optimization. If you're claiming 90% on an AI feature, recheck the math.

What "good" looks like:

Healthy margin: 70%+ blended (after power-user weighting)
Defensible margin: 55-70%
Marginal: 40-55% - viable but limits ad-spend ROI
Unprofitable: <40% - needs price hike or cost engineering

Top 3 vendors for cost-per-unit

Verified 20 hours ago

1

GPT-5 Mini

$0.250 in · $2.00 out ·
2

Command

$1.00 in · $2.00 out ·
3

devstral-2

$0.400 in · $2.00 out ·

Three real scenarios

Same calculator, three different team sizes. Click a tab to see how the numbers shift.

$2.30 / month ≈ $27.60 / year

Just-launched, light usage, $10/user revenue against $2.50 blended cost. ~70% margin. Healthy starting point.

Healthy range: 60-70% margin (early-stage)

See inputs used

revenuePerUserMonthlyUsd: 10
avgInferenceCostPerUserUsd: 2
powerUserPctOfBase: 3
powerUserCostMultiplier: 6
overheadCostPerUserMonthlyUsd: 1
targetGrossMarginPct: 65

Trade-offs

Cost isn't the only dimension. Click any constraint — see how recommendations change.

What matters most to you? Click any dimension — recommendations update.

Best fit for "cost":

Multi-model routing Cuts blended cost 30-50%
Prompt caching Cuts cost 30-50% on input-heavy features
Per-user usage caps Caps tail risk from power users

Margin engineering at scale is where AI features become profitable or die. Optimization isn't optional once revenue exceeds $100K/year on the feature.

Use cases

Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.

$0.59 / month ≈ $7.14 / year

Free tier with hope of conversion. Cost should be <$0.50/free user (1% conversion at $50 LTV pays back). Power-user multiplier matters most here - power users in free tier are pure cost. Gate aggressively.

Healthy range: Free-tier cost <$0.50/user (compatible with conversion economics)

See inputs used

revenuePerUserMonthlyUsd: 0
avgInferenceCostPerUserUsd: 0.5
powerUserPctOfBase: 1
powerUserCostMultiplier: 20
overheadCostPerUserMonthlyUsd: 0.1
targetGrossMarginPct: 0

What this calculator can't tell you

Honest limitations — every model is wrong; some are useful. Where this one falls short:

Power-user distribution model assumes a clean two-tier (avg vs power) - real distributions have a tail (1% super-users at 30× cost) that this doesn't capture.
Doesn't model usage growth over time (power users may grow into super-users).
Doesn't include CAC (customer acquisition cost) - gross margin only, not contribution margin.
Doesn't include support cost or churn implications of cheaper-tier model use.

For these, use: Budget Planner for annual allocation. Full TCO Wizard for sensitivity analysis.

Where to go next

Estimate cost per request precisely →

Input the four numbers that drive every LLM bill.

Validate margin at 10×, 100× scale →

Margin compresses with power-user concentration. See cliff points.

Allocate budget across features by margin →

Steer investment to the highest-margin AI features.

Methodology

Source: https://www.bvp.com/atlas/state-of-the-cloud-2024
Extraction: Power-user distribution model calibrated against 8 SaaS company benchmarks (anonymized).
Editorial gate: 8-layer defense — see aicost.ai/ai-cost-economics
Last verified: 6/4/2026, 8:00:00 PM

Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.

3 years of pricing history

Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.

View 3-year history for →

📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

All prices are USD per 1 million tokens, current as of 2026-06-05.
Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
Batch API discounts are 50% off standard rates across providers that offer Batch mode.
Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
Long-context pricing tiers apply when input exceeds model threshold.
Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic

2026-06-05

https://www.anthropic.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Anthropic Docs

2026-06-05

https://platform.claude.com/docs/en/about-claude/pricing

Daily snapshot since Sep 2023 · 578 days captured

OpenAI

2026-06-05

https://openai.com/api/pricing/

Daily snapshot since Sep 2023 · 579 days captured

Google AI

2026-06-05

https://ai.google.dev/gemini-api/docs/pricing

Daily snapshot since Dec 2023 · 554 days captured

Google Vertex

2026-06-05

https://cloud.google.com/vertex-ai/generative-ai/pricing

Daily snapshot since Dec 2023 · 554 days captured

DeepSeek

2026-06-05

https://api-docs.deepseek.com/quick_start/pricing

Daily snapshot since May 2024 · 493 days captured

xAI

2026-06-05

https://x.ai/api

Daily snapshot since Nov 2024 · 411 days captured

Mistral

2026-06-05

https://mistral.ai/pricing

Daily snapshot since Dec 2023 · 552 days captured

Cohere

2026-06-05

https://cohere.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Voyage AI

2026-06-05

https://docs.voyageai.com/docs/pricing

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →

AI Margin Calculator - Is Your AI Feature Profitable?

The story

🎛 Inputs you control

About this calculator: AI Margin Calculator - Is Your AI Feature Profitable?

Inputs you control

Outputs computed for you · model: `margin`

What you're looking at

Ready to run the numbers?

Reading your result

Top 3 vendors for cost-per-unit

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

The story

🎛 Inputs you control

About this calculator: AI Margin Calculator - Is Your AI Feature Profitable?

Inputs you control

Outputs computed for you · model: margin

What you're looking at

Ready to run the numbers?

Reading your result

Top 3 vendors for cost-per-unit

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

Outputs computed for you · model: `margin`