Published May 2, 2026 · Updated daily · ~10 min read

The new economics of AI: why pricing got hard, and what to do about it

A guide for SMB founders, developers, finance leads, and journalists trying to understand the cost layer of the AI stack. We monitor 17 vendors continuously — here's what's changing, who it hits hardest, and how to navigate it.

Vendors monitored daily

40+

Calculators by use case

3-layer

Editorial gate on data

Last week, our crawlers detected a 25% across-the-board price cut on Google's gemini-2.0-flash-lite — input, output, batch input, batch output, all dropped in unison. The change wasn't announced. There was no Google blog post, no press release, no notification email. The pricing page just quietly updated, and developers using that model saw their bill shrink the next morning.

This is what AI cost economics looks like in 2026. Prices change without warning. Models get renamed mid-cycle. New billing dimensions appear — context caching storage rates, hybrid seat-plus-overage plans, agentic workflow surcharges — and disappear from documentation just as quickly. The four companies that dominate frontier AI — Anthropic, OpenAI, Google, and Microsoft — are each restructuring pricing in different directions, on different timelines, with different vocabularies for the same concepts.

For most software-as-a-service categories, this would be unusual. For AI, it's now the baseline.

📊 What we've detected — last 14 days

This is what the platform sees, in real time, across the major vendors. Each row links through to the affected vendor's full guide. Nothing here is announced — these are pricing-page deltas our crawlers detect, validate through three layers of sanity checks, and surface for human review before they hit our published guides.

▼ 6% cut

ElevenLabs · eleven-flash-v2-5 · Per-minute audio ($0.18 → $0.17)

Affects voice agents, transcription, realtime chat · See ElevenLabs guide →

Jun 5

▼ 75% cut

Mistral · mistral-large-3 · Input tokens ($2 → $0.5)

Affects every API call — input is the largest cost driver in RAG · See Mistral guide →

Jun 5

▼ 75% cut

Mistral · mistral-large-3 · Output tokens ($6 → $1.5)

Affects every generation — chat, completion, agentic workflows · See Mistral guide →

Jun 5

NEW

xAI · imagine-api-image · Per-image (standard)

Affects image-generation pipelines · See xAI guide →

Jun 5

▼ 90% cut

Google · gemini-3-pro-image · Batch output ($60 → $6)

Affects nightly/async workloads — backfills, embeddings, summarization · See Google guide →

Jun 5

▼ 83% cut

Google · gemini-2-5-flash-native-audio-preview-12-2025 · Input tokens ($3 → $0.5)

Affects every API call — input is the largest cost driver in RAG · See Google guide →

Jun 5

▼ 83% cut

Google · gemini-2-5-flash-native-audio-preview-12-2025 · Output tokens ($12 → $2)

Affects every generation — chat, completion, agentic workflows · See Google guide →

Jun 5

▼ 75% cut

Google · gemini-3-1-flash-live-preview · Input tokens ($3 → $0.75)

Affects every API call — input is the largest cost driver in RAG · See Google guide →

Jun 5

▼ 63% cut

Google · gemini-3-1-flash-live-preview · Output tokens ($12 → $4.5)

Affects every generation — chat, completion, agentic workflows · See Google guide →

Jun 5

▼ 33% cut

Google · gemini-3-1-flash-image · Per-image (standard) ($0.067 → $0.045)

Affects image-generation pipelines · See Google guide →

Jun 5

📡 Live data from our crawler. Want the full change history? View crawler health dashboard →

From subsidies to real-world economics

The last three years of AI pricing were not normal. They were a venture-capital-funded marketing campaign. Token prices fell roughly 80% from late 2022 through 2024 as vendors competed for developer mindshare. Frontier models were sold below the marginal cost of inference. Free tiers were generous. The implicit message to developers was: don't worry about cost, build something interesting.

That era is ending — and the math behind why is straightforward.

Anthropic, OpenAI, Google, and Microsoft have collectively committed to more than $1 trillion in AI infrastructure spending between 2024 and 2027. Microsoft's Azure capex alone is on pace to exceed $80 billion this fiscal year. Google's TPU buildout, OpenAI's Stargate partnership, Anthropic's compute commitments to Bedrock and Vertex — these are physical data centers, GPUs, power contracts, and cooling infrastructure that have to earn a return.

You don't recover a trillion-dollar capex bill by giving away tokens. The subsidies were always temporary. What's arriving now is the actual cost structure of running this technology — and unlike SaaS, where margins compress predictably as scale grows, AI workloads have variable costs that scale with usage in ways most buyers haven't priced in.

The result, for buyers: pricing has become both more expensive AND more complex at the same time. SaaS got cheaper as it got more complex. Cloud got cheaper as it got more complex. AI is going the other way.

Meanwhile, MIT NANDA's State of AI in Business 2025 found that 95% of enterprise generative-AI pilots fail to deliver measurable financial returns. Not because the models don't work — but because nobody priced the workload correctly before deploying it, and nobody updated the cost model when the vendor changed pricing three months later.

Why SMBs are hit hardest

Enterprises have FinOps teams. They have procurement professionals. They have engineers whose job is to read pricing pages carefully. When Anthropic restructures its prompt caching to introduce 1-hour cache tiers separate from 5-minute tiers, an enterprise team notices within a day and updates internal cost models. When Microsoft Copilot pricing fragments into seven different tiers — Family, Personal, Business Basic, Business Standard, Business Premium, Apps for Business, E3/E5 add-ons — an enterprise has someone whose Tuesday meeting is to map all of that against existing contracts.

Small and medium businesses have none of this.

A typical SMB story goes like this: in 2023, one developer chose OpenAI because the docs were good. By 2024, the company had three engineers using a mix of OpenAI for chat, Anthropic for long-context tasks, and Google for embeddings. In 2025, they added an AI-coding assistant on individual subscriptions, plus a vector database, plus an observability tool. By 2026, the monthly AI bill is somewhere between $4,000 and $40,000, the founder has no idea which line items are essential, and the engineering team can't tell you whether a workload migration would save money or cost more, because the pricing changed twice in the last quarter.

This is the shadow IT problem in a new form. Procurement teams don't see the AI spend because much of it goes on personal cards reimbursed monthly. Finance teams see the line items but can't evaluate whether the vendor mix is rational. Engineering teams know the technical tradeoffs but rarely the cost ones. Nobody in the org has the full picture.

⚡ Quick: estimate your agentic-AI burn

Open full tool →

The four-vendor reality

Each major vendor has restructured pricing in distinct ways over the last six to twelve months. No SMB engineering team has time to track all four in real time. Most don't realize the pricing changed until the credit card statement arrives.

Hybrid plans across the four major coding-AI vendors

Vendor	Plan	Seat	Overage starts at	Notes
Anthropic	Claude Pro	$17/mo	When seat credits exhaust	API credits roll over within billing period; team plan adds shared usage pools
GitHub Copilot	Pro / Pro+	$19/mo	300 / 1,500 premium requests	Pro+ tier ($39) covers Sonnet 4.6 + GPT-5 access
Cursor	Pro	$20/mo	500 fast requests/month	Slow requests unlimited; fast tier for premium models
Microsoft Copilot	Business / Enterprise	$30/mo	Usage-based, beyond seat	7-tier pricing (Family/Personal/Business/E3/E5). Quarterly manual audit.

What no individual vendor will show you: this same comparison. Vendors won't put a competitor's seat price next to theirs. We will. Run the numbers for your team: Overage Forecaster →

Costs aren't just tokens

The most expensive misconception in AI deployment is that cost equals tokens. The actual cost stack of a production AI workload looks like this:

Compute: token API charges, image and video generation rates, audio per-minute fees
Storage and retrieval: vector database, prompt cache storage, embedding generation
Routing: model gateway fees, prompt-routing logic, fallback chains
Evaluation: benchmark suites, A/B testing infrastructure, judge-model calls
Observability: trace storage, cost-per-trace, log retention
Security: PII redaction services, prompt-injection detection, output filtering
Compliance: SOC2 audit trails, data residency, audit log retention
Reputational risk: the cost of a hallucination in a customer-facing context, or a compliance failure in a regulated industry

For a typical RAG pipeline or agentic workflow, the token cost is often less than half of the total operating cost. The other half is the surrounding infrastructure — and most of that infrastructure has its own pricing complexity, often opaque, often consumption-based, often poorly aligned with the workload you actually run.

This is why we built our 7-step TCO Wizard — it walks through every cost dimension, not just compute. Persona-tuned (CFO/CTO/Founder/PM) executive synthesis at the end.

The agentic shift changes the math

A standard chat completion in 2024 used roughly 500 to 2,000 tokens of input and produced a similar amount of output. A user might run 50 of these per day. Daily cost: pennies.

A modern agentic workflow — Claude Code editing a multi-file project, an autonomous research agent traversing a knowledge base, a multi-step reasoning chain handling a customer support escalation — consumes 50,000 to 200,000 tokens per task. A team running 100 such tasks per day is looking at $15,000 to $60,000 per month, invisibly accumulating, often charged to whichever credit card a developer signed up with first.

This is not a hypothetical. The shift from "AI as feature" to "AI as workforce" — even at small scale — changes the cost equation by an order of magnitude. Most cost models built in 2023 are wrong by 2026.

Per-workflow cost across major vendors (~100K tokens/task)

Model	Per workflow	100/day → monthly
Anthropic Claude Sonnet 4.6	$0.45	$1,350	Guide →
OpenAI GPT-5	$0.50	$1,500	Guide →
Google Gemini 3 Flash	$0.18	$540	Guide →
DeepSeek V4	$0.08	$240	Guide →
Anthropic Haiku 4.5	$0.10	$300	Guide →

Illustrative figures based on published rates. Real workflow cost depends on input/output split, cache hit rate, batch eligibility — quantify for your scenario with the Agent Loop Cost calculator (architect view) or Agentic Workflow Cost (consumer view).

📊 Quick: will your AI investment pay back?

Open full tool →

How we help — concretely

We built aicost.ai because nobody else was doing this work credibly for the small-to-medium segment. Our coverage is structured around how decisions actually get made:

17 vendor pricing guides with persona-led narratives. The Anthropic page reads differently for a developer (focused on API rates, model selection, caching strategy) than for a finance lead (focused on plan tiers, seat economics, predictable monthly spend) than for a CTO (focused on TCO, vendor lock-in, migration cost). Same data, different framing — because the questions are different.
45+ calculators organized by use case, not by vendor. A RAG pipeline calculator that lets you mix Anthropic for generation, Voyage for embeddings, Pinecone for vectors. A fine-tune ROI calculator. An agentic workflow estimator. A vendor concentration risk analyzer (HHI score). All with Quick + Advanced modes.
7-step TCO + ROI Wizard — for the conversations that don't happen often enough: building the cost case for an AI initiative, choosing between hybrid plans and pure API spend, sensitivity analysis across 6 production pillars (security, compliance, observability, PII, HITL, cost controls).
Agentic Workflow Cost — the consumer-side answer to "what's my Cursor + Claude Code + autonomous-agent bill?" Compares 4 vendors at the same scenario.
Live changelog — every detected price change goes into a public, dated, sourced log. Vendors don't always announce their changes. We surface them anyway.

All our tools, organized by what you're trying to accomplish

Browse by category. Each tool is built around a specific decision SMBs face — not vendor-by-vendor pricing, but use-case-by-use-case math. Most tools have a Quick mode (2-4 fields) and Advanced mode (full control). Mobile defaults to Quick.

💼 Plan & seat economics

Pick the right tier, forecast overage before it bites

Overage Forecaster
Project monthly cost across seat-based plans with overage
Subscription Picker
Compare 69 tiers across 12 vendors based on your usage
Annual vs Monthly
When annual billing actually saves money
Family Plan Comparator
Cheapest household AI: individual vs Google One vs M365

🔌 API & token economics

Optimize per-call cost on production workloads

Cost Calculator
Token volume × model pricing across 17 vendors
Token Estimator
Paste text → cost across every model
Cheapest Model Finder
3 questions → cheapest model for your job
Batch vs Realtime
When the 50% batch discount actually pays
Prompt Cache ROI
Setup cost vs ongoing savings on cached reads
Context Window Cost
When long context tier pricing kicks in

🤖 Agentic & multi-step

Modern AI workloads consume 10× the tokens — model accordingly

Agentic Workflow Cost
Estimate burn for Claude Code, Cursor, autonomous agents (consumer view)
Agent Loop Cost
Per-task multi-turn agent cost + runaway risk (architect view)
Agentic AI Stack
Planner + executor stack composition
Multi-Model Router
Stop paying flagship for easy queries

🔍 RAG & vector search

Storage + embeddings + retrieval + generation — the full stack

RAG Pipeline Cost
End-to-end cost across embeddings, vector DB, generation
Multimodal RAG Stack
Text + image RAG with vision integration
Vector DB Cost
Pinecone vs Qdrant vs Weaviate vs pgvector
Embedding Cost
Voyage vs OpenAI vs Cohere vs Mistral
Chunking Optimizer
Chunk size vs retrieval quality vs token cost
Hybrid Search Cost
Vector + BM25 stacked retrieval economics
RAG vs Fine-Tuning
When fine-tuning beats RAG economically

🎬 Voice, image, video

Multimodal pricing — different units, different math

Voice Agent Stack
Realtime voice agent: STT + LLM + TTS combined
Audio Cost
Transcription, TTS, voice agents per audio hour
Vision Cost
Image generation: Midjourney vs DALL-E vs Imagen

📊 Strategy & ROI

Build the business case before you ship

TCO Wizard
7-step full TCO: compute + eval + observability + compliance
Quick TCO
Single-page TCO with vertical overlays
ROI Quick Check
Hours saved × labor cost vs AI spend, with payback period
Margin Calculator
Revenue per user vs AI cost per user — gross margin
Budget Planner
Allocate budget across workloads and vendors
Annual Cost Forecaster
12-month projection with growth + price-decline modeling
Vendor Concentration Risk
HHI score + single-point-of-failure analysis
Self-Host Break-even
API vs GPU rental decision economics
Fine-Tuning Cost
Training upfront vs inference savings

📐 Methodology — how we ensure the data is right

For journalists evaluating us as a source, this section matters. AI pricing tracking is full of low-quality automated scrapers that publish whatever a language model extracts. We built three layers of defense against that failure mode:

Extraction-level rules. Our extraction prompts are tuned for the specific failure patterns each vendor introduces — Google's three different cache rates, Anthropic's cache write versus cache read distinction, the difference between per-token and per-minute pricing for audio models.
Code-level sanity invariants. Even if extraction returns something implausible, our code rejects values that violate physical constraints. A cached input rate must be cheaper than a base input rate; a batch rate must be cheaper than a realtime rate; a premium-tier rate must be more expensive than a standard rate. Violations get flagged, not published.
Editorial gate. Our published pricing data lives in a human-curated source file that the crawler reads but never writes. Every detected change accumulates in a changelog; a human reviews each row before it lands in user-facing guides. This is slower than fully automated systems. It's also why our data is correct.

When our crawler detects a vendor change, it appears in the changelog with full provenance. When it appears in a published guide, a human has signed off. We treat published pricing claims with the same care a financial publication treats earnings figures. Per-calculator methodology lives at /methodology.

The optimistic thread

When SMBs achieve real ROI on AI, they buy more AI. The economics are aligned. Vendors don't actually want pricing to be opaque — opacity drives adoption down and churn up. But the rate of pricing change has outpaced the rate at which vendors can document it clearly, and the gap that's opened up is what aicost.ai fills.

The MIT NANDA finding — 95% of enterprise generative-AI pilots fail to deliver measurable financial returns — is not a problem of AI capability. The models work. The pilots fail because nobody priced the workload correctly before deploying it, and nobody updated the cost model when the vendor changed pricing three months later. Closing that gap — making AI projects predictably profitable — is good for buyers, good for vendors, and good for the trillion-dollar infrastructure investment that's making this technology possible at all.

We're betting the next two years of AI adoption will be defined by which buyers can navigate the cost layer competently, and which can't. We built aicost.ai because we want the SMB segment to be in the first group.

Start with the question you actually have

Every visitor lands here for a different reason. Skip ahead to whatever you came for.

Browse vendor guides All 45+ calculators Full TCO Wizard Live changelog

About

aicost.ai is a product of CloudIntelligence.ai LLC, founded by Subu Vdaygiri — a 17-year veteran of cloud product management at Ingram Micro and Siemens Corporate Research, with executive programs at Wharton (CTO) and Kellogg (CPO) and 10× AWS and Azure certifications. CloudIntelligence.ai's portfolio also includes ToolsInfo.com (115K+ AI tools across 39 verticals), AIPapers.ai (3M+ research papers with semantic search), and AINewsCycle (curated AI industry briefings). aicost.ai launched in early 2026 to bring transparency to AI cost economics.

For press inquiries, methodology questions, or data partnerships, contact us at [email protected].

📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

All prices are USD per 1 million tokens, current as of 2026-06-05.
Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
Batch API discounts are 50% off standard rates across providers that offer Batch mode.
Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
Long-context pricing tiers apply when input exceeds model threshold.
Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic

2026-06-05

https://www.anthropic.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Anthropic Docs

2026-06-05

https://platform.claude.com/docs/en/about-claude/pricing

Daily snapshot since Sep 2023 · 578 days captured

OpenAI

2026-06-05

https://openai.com/api/pricing/

Daily snapshot since Sep 2023 · 579 days captured

Google AI

2026-06-05

https://ai.google.dev/gemini-api/docs/pricing

Daily snapshot since Dec 2023 · 554 days captured

Google Vertex

2026-06-05

https://cloud.google.com/vertex-ai/generative-ai/pricing

Daily snapshot since Dec 2023 · 554 days captured

DeepSeek

2026-06-05

https://api-docs.deepseek.com/quick_start/pricing

Daily snapshot since May 2024 · 493 days captured

xAI

2026-06-05

https://x.ai/api

Daily snapshot since Nov 2024 · 411 days captured

Mistral

2026-06-05

https://mistral.ai/pricing

Daily snapshot since Dec 2023 · 552 days captured

Cohere

2026-06-05

https://cohere.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Voyage AI

2026-06-05

https://docs.voyageai.com/docs/pricing

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →