Mistral AI pricing, complete breakdown

Verified 2026-05-16, cross-checked against Mistral AI pricing page, litellm, openrouter

Mistral AI currently offers two primary production models with distinct price points for different use cases. Mistral Large 3 serves as the flagship frontier model at $2 per million input tokens and $6 per million output tokens. For high-volume, cost-sensitive tasks, Mistral Small 4 provides a more economical path at $0.1 per million input and $0.3 per million output tokens. These models feature context windows of 262,000 and 128,000 tokens respectively. This page helps you calculate projected costs, compare model efficiency, and track recent pricing volatility.

Mistral Small 4 offers the lowest entry point at just $0.10 per million input tokens.

How Mistral AI's pricing universe works

Mistral AI operates a multi-channel pricing strategy to maximize market reach across the developer, consumer, and enterprise segments. By offering metered API access alongside fixed-rate subscriptions and cloud marketplace deployments, they capture both high-margin builder activity and predictable recurring revenue. This hybrid approach allows Mistral to monetize their frontier models through whichever procurement path best fits the customer's technical and budgetary constraints.

API (per-token, metered)

For: Developers, technical teams, startups building products on top of Claude
  • Pay only for tokens consumed
  • Full model lineup including batch, caching, long context
  • Programmatic via SDKs
When to use: When integrating Mistral AI into your own product or running variable batch workloads
Best for: Builders with metered or unpredictable usage

Consumer subscriptions (Pro, Max tiers)

For: Individuals using Mistral AI directly for writing, coding, research, analysis
  • Fixed monthly fee
  • Generous usage caps
  • Web/desktop/mobile apps
  • Often includes newer models first
When to use: When using Mistral AI as a daily-driver AI assistant rather than building on it
Best for: Solo professionals, knowledge workers, vibe coders

Business/Team plans

For: Teams of 5-200 needing shared workspaces, admin controls, SSO
  • Per-seat billing
  • Centralized billing
  • Admin & audit controls
  • Sometimes shared usage pools
When to use: When deploying Mistral AI across a team that does NOT need API integration
Best for: Mid-size organizations adopting AI for internal productivity

Enterprise (custom contract)

For: Large organizations with procurement requirements, compliance needs, or volume-discount leverage
  • Custom pricing and limits
  • SLAs
  • DPAs and BAAs
  • Dedicated support
  • Sometimes private cloud / VPC
When to use: When per-seat or per-token pricing exceeds ~$50K/year, or when compliance/contractual needs require it
Best for: Enterprises with procurement-led adoption

Cloud marketplaces (AWS Bedrock, Google Vertex, Azure)

For: Organizations with existing cloud commits or strict data-residency requirements
  • Same models, slightly different pricing (often parity or small premium)
  • Counts toward existing cloud spend commits
  • Stays within cloud's data-protection boundary
When to use: When you already burn down EDP/MACC/CCC commits and prefer single-bill
Best for: Cloud-committed enterprises
Which one should you pick? If you are building a product, use the API for metered control. For personal use, a consumer subscription provides the best value. Teams should opt for the Team plan for administrative oversight, while large organizations should leverage enterprise contracts or cloud marketplaces to satisfy compliance and procurement requirements.

Current pricing (all production models)

ModelInput $/MOutput $/MCached $/MContext
Mistral Large 3
mistral-large-3
$2 $6 262,000
Mistral Small 4
mistral-small-4
$0.10 $0.30 128,000

Pricing is based on standard API rates per million tokens. No explicit cache or batch pricing is currently stated for these versions. Verified as of May 16, 2026.

Full rate breakdown (all variants)

Variants beyond standard API: batch (async, 50% off), cached read (0.1x), cache writes (1.25x or 2x base), long-context tier (~2x above threshold).

Mistral Large 3 mistral-large-3

Flagship reasoning for complex multi-step agents and coding
Primary useBuilt for high-complexity tasks requiring deep reasoning, multilingual support, and large context windows.
Who picks itEnterprise developers building production-grade RAG systems and autonomous agents.
Vs other Mistral AI modelsAt $2/M input and $6/M output, this is the premium tier compared to Mistral Small 4's $0.10/$0.30 rates.
When to useChoose this for logic-heavy workflows; switch to Small 4 for simple classification or high-volume summarization.
Equivalents at other vendors
openai
GPT-5.4 Similar flagship performance tier, but Mistral Large 3 is significantly cheaper on output tokens ($6 vs $15).
google
Gemini 3.1 Pro Matches the $2 input rate, though Mistral Large 3 provides a more economical $6 output rate compared to $12.
xai
Grok 4.20 (non-reasoning) Directly competes on price with identical $2 input and $6 output rates for high-reasoning workloads.

Mistral Large 3 mistral-large-3

VariantInput $/MOutput $/MNotes
Standard $2 $6 Default per-token API rate

Mistral Small 4 mistral-small-4

Optimized efficiency for high-volume classification and extraction tasks
Primary useDesigned for low-latency processing of large datasets and high-throughput background jobs.
Who picks itTeams scaling production workflows where cost-per-token is the primary constraint.
Vs other Mistral AI modelsPriced at $0.10/M input and $0.30/M output, it offers a 20x cost reduction over Mistral Large 3.
When to useUse for high-volume tasks like entity extraction; upgrade to Large 3 if reasoning depth is insufficient.
Equivalents at other vendors
deepseek
deepseek-v4-flash Competes in the high-efficiency tier with nearly identical sub-dollar pricing for fast inference.
cohere
Command R Targeted at similar RAG and tool-use workflows, though Mistral Small 4 is cheaper on both input and output.
xai
Grok 4.1 Fast (non-reasoning) Similar low-latency positioning, but Mistral Small 4 offers a 50% lower entry price for input tokens.

Mistral Small 4 mistral-small-4

VariantInput $/MOutput $/MNotes
Standard $0.10 $0.30 Default per-token API rate

What changed in the last 30-90 days

How buyers think about Mistral AI pricing

Each scenario below is interactive — tweak the inputs to see how the math changes for your workload.

Cheapest Mistral model for high-volume tasks

vibe-codersolopreneurdeveloper

The problem: You need to process millions of simple classification or extraction tasks without exhausting your budget on frontier-class models. High-volume workloads can quickly become unsustainable if you use flagship models for basic logic.

What to do: Deploy Mistral Small 4 for high-throughput utility tasks while reserving Mistral Large 3 for complex reasoning.

Processing 10 million input tokens and 5 million output tokens on Mistral Small 4 costs (10M tokens × $0.1/M) + (5M tokens × $0.3/M) = $2.50 total. The same volume on Mistral Large 3 would cost (10M tokens × $2/M) + (5M tokens × $6/M) = $50.00 (as of 2026-05-16).

→ Mistral Small 4 provides a 95% cost reduction compared to Large 3 for high-volume utility workloads.

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

Self-host open weights vs pay La Plateforme API

developersmbenterprise

The problem: You are weighing the infrastructure overhead of self-hosting Mistral models against the simplicity of the managed API. You need to know when the operational complexity of a private cluster pays for itself.

What to do: Utilize La Plateforme API for development and scaling, then consider self-hosting once monthly volume exceeds 50 million tokens.

At 50 million tokens per month (assuming 25M input and 25M output) on Mistral Large 3, the API cost is (25M × $2/M) + (25M × $6/M) = $200 per month (as of 2026-05-16). If your internal GPU hosting and engineering maintenance costs exceed this threshold, the API remains the more economical choice.

→ The API is generally more cost-effective for workloads under 50 million tokens per month due to zero maintenance overhead.

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

When Mistral Large 3 beats frontier alternatives

developerit-buyerenterprise

The problem: You require top-tier performance for complex reasoning but want to avoid the high price points or data residency concerns of other frontier models. You need a competitive alternative that balances power with predictable pricing.

What to do: Standardize on Mistral Large 3 for production reasoning tasks to leverage its competitive $2/$6 pricing structure.

Running a complex agentic workflow with 500,000 input tokens and 200,000 output tokens costs (0.5M × $2/M) + (0.2M × $6/M) = $2.20 per run (as of 2026-05-16). This provides a predictable baseline for budgeting enterprise-grade intelligence.

→ Mistral Large 3 offers frontier-level intelligence at a transparent $8.00 per million blended token rate.

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

Le Chat Pro and Team subscriptions

solopreneursmb

The problem: You need consistent access to Mistral models for daily research and drafting but find per-token API billing difficult to predict for human-in-the-loop tasks. You want a flat-rate option for your team.

What to do: Use Le Chat Pro for individual power users or Le Chat Team for collaborative environments to cap monthly spend.

A Le Chat Pro subscription costs $14.99 per month (as of 2026-05-16). If a user generates 3 million output tokens on Mistral Large 3 via the API, the cost would be (3M × $6/M) = $18.00, making the subscription more economical for heavy manual usage.

→ Le Chat Pro pays for itself if a user generates more than 2.5 million output tokens per month on flagship models.

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

Mistral on AWS Bedrock vs direct

it-buyerenterprise

The problem: Your organization is already committed to the AWS ecosystem and you need to decide whether to use Mistral's direct API or the Bedrock-hosted version. You need to balance feature access with procurement efficiency.

What to do: Use AWS Bedrock for production workloads to utilize existing cloud commits and Provisioned Throughput discounts.

While on-demand pricing on Bedrock matches direct rates ($2/$6 for Large 3), using Provisioned Throughput can offer savings of 15-30% for high-volume workloads. A 30% discount reduces the blended cost of 1M input and 1M output tokens from $8.00 to $5.60 (as of 2026-05-16).

→ AWS Bedrock is the preferred choice for enterprises seeking to reduce effective rates through provisioned capacity commitments.

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

EU data residency premium with Mistral

it-buyerenterprise

The problem: Strict GDPR requirements or internal compliance policies require your data to remain within the European Union. You need a high-performance model that satisfies these residency requirements without a massive price premium.

What to do: Deploy Mistral models via La Plateforme's European regions or Azure's EU-based data centers.

Mistral Large 3 maintains a consistent price of $2 per million input tokens and $6 per million output tokens regardless of its EU-based hosting (as of 2026-05-16). This allows for compliance without the 'residency tax' often seen in other cloud services.

→ Mistral provides native EU data residency at standard global pricing rates.

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

Volume discounts & partner programs

Heads up — these are community-sourced and analyst-reported terms. Specific credit amounts, discount percentages, and program thresholds change frequently. Always verify current terms directly with Mistral AI before relying on a specific number. Treat reported figures as ballpark, not contract language.

Mistralship (Startup Program)

Threshold: Startups founded less than 7 years ago that have not raised a Series B or later funding round

Typical discount (reported): 30,000 credits for La Plateforme

Benefits:

How to engage: Apply via the official Mistral AI startup program form using a business email

Source: dataphoenix.infocommunity · cited 2024-12-18

Mistral AI Ambassador Program

Threshold: Startups building AI applications with Mistral models

Typical discount (reported): Free API credits (value varies)

Benefits:

How to engage: Apply through the official Mistral AI Startup Program portal

Source: startup-perks.comcommunity · cited 2026-02-14

Mistral AI Enterprise Plan

Threshold: Reportedly starts at approximately $20,000 per month or equivalent annual commitment

Typical discount (reported): Volume discounts vary by contract

Benefits:

How to engage: Contact Mistral AI sales team for a custom quote

Source: wise.comanalyst_report · cited 2025-08-19

Mistral AI Usage Tiers (La Plateforme)

Threshold: Automatic upgrades based on cumulative billing: Tier 1 ($0), Tier 2 (>$20), Tier 3 (>$100), Tier 4 (>$500)

Typical discount (reported): Standard pay-as-you-go rates with increased rate limits

Benefits:

How to engage: Upgrade to a Scale plan in the Mistral AI Admin console; tiers advance automatically with spend

Source: docs.mistral.aivendor_official · cited 2025-10-30

Azure AI Foundry Provisioned Throughput Reservations

Threshold: Available for 1-month or 1-year terms

Typical discount (reported): Reportedly up to 70% savings compared to hourly pay-as-you-go pricing

Benefits:

How to engage: Purchase via Azure AI Foundry portal or contact Azure sales

Source: techcommunity.microsoft.comvendor_official · cited 2025-05-19

AWS Bedrock Provisioned Throughput

Threshold: 1-month or 6-month commitment terms

Typical discount (reported): Reportedly 20–40% for 6-month commitments

Benefits:

How to engage: Purchase Model Units (MUs) through the AWS Bedrock console

Source: medium.comcommunity · cited 2026-03-10

Google Vertex AI Committed Use Discounts (CUDs)

Threshold: 1-year or 3-year spending commitments

Typical discount (reported): Approximately 25% to 55% savings

Benefits:

How to engage: Purchase via Google Cloud Console under Billing > Commitments

Source: cloud.google.comvendor_official · cited 2024-12-02

Multi-cloud availability

Cloud-marketplace terms change frequently. Model availability dates, pricing parity, and regional features can drift week to week. Verify with each cloud's pricing page (AWS Bedrock, Google Vertex, Azure AI Foundry) before architecting around specifics.
CloudModel availabilityPrice vs vendor-directReasons to pick
AWS Bedrock Mistral Large 3, Ministral 3 (3B, 8B, 14B), Mistral Large, Mistral Small, Mixtral 8x7B, Mistral 7B, Pixtral Large On-demand pricing reportedly matches provider direct API rates; batch inference is 50% off
  • Serverless, fully managed endpoints with no infrastructure management
  • Deep integration with AWS ecosystem including IAM, CloudWatch, and VPC endpoints
  • Provisioned Throughput offers approximately 15-30% savings for predictable high-volume workloads

mistral.ai ↗
Google Vertex AI Mistral Medium 3, Mistral OCR (25.05), Mistral Small 3.1 (25.03), Codestral 2 Pay-as-you-go; context caching available at a 90% discount
  • Tight integration with Google Cloud data stack including BigQuery and Cloud Storage
  • Context caching provides significant cost savings for long-context applications
  • Managed API surface allows for streaming responses to reduce latency perception

cloud.google.com ↗
Microsoft Azure Mistral Large, Mistral Small, Mistral Nemo, Codestral, Mixtral Standard Azure ML pay-per-token pricing
  • Seamless integration with Microsoft 365 and enterprise-ready ML workflows
  • Strong hybrid and multi-cloud support via Azure Arc
  • Advanced security features and compliance certifications for regulated workloads

azure.microsoft.com ↗
Together AI Mistral Large, Mixtral 8x22B, Mistral 7B variants Mistral Large at $9.00 per million output tokens; Mixtral 8x22B at $1.20 per million tokens
  • Neutral open-model host with an OpenAI-compatible SDK
  • Supports LoRA fine-tuning on major Mistral model sizes
  • Offers a $5 free signup credit with no monthly minimums

together.ai ↗
Anyscale Mistral-7B-Instruct-v0.1 $0.15 per 1M tokens for both input and output
  • Optimized for Ray-based infrastructure for faster and cost-effective AI workloads
  • Supports production-grade batch workloads with job queues and automatic retries
  • Provides $100 in free credits for new users

anyscale.com ↗
Snowflake Cortex mistral-large2, mistral-7b mistral-large2 at 1.00 credits/M input and 3.00 credits/M output; AI Credits priced at $2.00 per credit
  • In-warehouse LLM functions allow for data residency within Snowflake
  • SQL-native AI functions (AISQL) for easy integration with existing data pipelines
  • New AI Credits tier provides up to 60-80% cost reduction for high-edition customers

docs.snowflake.com ↗
IBM watsonx Mistral Large Pay-as-you-go pricing per million tokens; varies by plan
  • Enterprise-grade governance and risk management frameworks
  • Integration with watsonx.data for real-time streaming data via Confluent
  • Model-agnostic routing platform (IBM Bob) for optimizing accuracy and cost

ibm.com ↗

Free credits & startup programs

Program details and credit amounts shift often. Apply directly through each program's official page for current values, eligibility windows, and application requirements.

Mistralship (Mistral AI Startup Program)

Reported value: 30,000 platform credits

Eligibility: Startups founded less than 7 years ago that have not raised a Series B or later round; requires a business email and online presence

How to apply: Fill out the application form on Mistral AI's platform (La Plateforme) during open cohort windows

Apply / learn more at dataphoenix.info ↗

Mistral AI Ambassador Program

Reported value: Free API credits and early access

Eligibility: Startups building AI applications with Mistral models; typically pre-seed, seed, or Series A stages

How to apply: Apply through the official Mistral AI startup portal or partner referral links

Apply / learn more at startup-perks.com ↗

Google for Startups Cloud Program (Scale Tier)

Reported value: $10,000 USD in credits for partner models

Eligibility: Qualifying startups in the Scale and Scale AI Tier of the Google for Startups Cloud Program

How to apply: Members must contact their Google Cloud Account Executive to request access to these partner model credits

Apply / learn more at cloud.google.com ↗

AWS Activate

Reported value: up to $100,000 in AWS Activate Credits

Eligibility: Self-funded or pre-Series B startups founded in the past 10 years; Portfolio tier requires association with an Activate Provider

How to apply: Apply via the AWS Activate console; credits are redeemable for third-party models in Amazon Bedrock including Mistral AI

Apply / learn more at aws.amazon.com ↗

Microsoft for Startups Founders Hub

Reported value: up to $150,000 in Azure credits

Eligibility: Privately held, for-profit startups that have not gone through a Series D or later funding round

How to apply: Sign up through the Microsoft for Startups Founders Hub portal; credits can be used for Mistral models available on Azure AI

Apply / learn more at microsoft.com ↗

Mistral AI 2026 Worldwide Hackathon

Reported value: $15,000 in Mistral credits (Grand Prize)

Eligibility: Participants in the 48-hour global hackathon event

How to apply: Register for the hackathon through the official Mistral AI event page

Apply / learn more at mistral.ai ↗

Mistral AI Academic Partnership (ESSEC)

Reported value: Licenses for Le Chat Entreprise and research support

Eligibility: Researchers, professors, and students at ESSEC Business School

How to apply: Access provided through the ESSEC Metalab interface as part of the institutional partnership

Apply / learn more at essec.edu ↗

Mistral AI Researcher Access (Aalborg University)

Reported value: API integration and AI Studio access

Eligibility: Researchers at Aalborg University (AAU)

How to apply: Access via the university's internal AI services portal

Apply / learn more at en.its.aau.dk ↗

Pricing gotchas to watch

Most gotchas below were surfaced by community reports. Some may have been fixed, changed, or never been the user-facing issue they appeared. Verify against current vendor docs before architecting around a workaround.

Prompt Cache 64-Token Minimum Block Size

Mistral's prompt caching mechanism operates on fixed blocks of 64 tokens. Prompts with a shared prefix of fewer than 64 tokens will not trigger a cache hit, and all cached token counts reported in the API response will be multiples of 64.

Workaround: Ensure system prompts or shared context prefixes are at least 64 tokens long to benefit from the 90% discount on cached tokens.

Source: docs.mistral.aivendor_docs · cited 2026-05-16

Vibe API Spending Limit Bypass

Users have reported that monthly API spending limits configured in the Mistral Admin Console may only apply to standard API usage and not to the 'Vibe API' (used by Mistral Vibe CLI). This has reportedly led to cases where users exceeded their set limits by hundreds of dollars without the API being throttled.

Workaround: Monitor usage for Mistral Vibe separately in the dashboard and manually track spending if using both the standard and Vibe APIs concurrently.

Source: reddit.comreddit · cited 2026-03-04

Le Chat Pro Subscription vs. Vibe CLI Billing

A common point of confusion for production users is that a Le Chat Pro subscription ($14.99/mo) does not grant unlimited or free usage of the Mistral Vibe CLI. While it provides higher usage limits, activity beyond those limits is billed at standard pay-as-you-go (PAYG) API rates.

Workaround: Check the Vibe CLI configuration (~/.vibe/config.toml) to ensure it is using the intended model and monitor the 'usage %' in the online interface to avoid unexpected PAYG charges.

Source: github.comgithub_issue · cited 2026-01-26

Experimental Model Pricing Transitions

Models released for experimental periods, such as 'Devstral Small 2' (labs-devstrall-small-2512), can transition from free to paid status with minimal notice. Users have reported that the pricing page may continue to display a '$0' price tag (often with the original price crossed out) even after the model has moved to a paid tier, leading to unexpected billing spikes.

Workaround: Verify the current billing status of 'labs' or experimental models via support or recent Discord announcements before deploying them in high-volume production workflows.

Source: reddit.comreddit · cited 2026-03-11

Image Token Usage Surprises

Integrating images into workflows using multimodal models like Pixtral or Devstral can cause token consumption to increase significantly faster than text-only prompts. Users have reported usage 'skyrocketing' from minimal levels to near-quota limits shortly after enabling image support, reportedly due to high per-image token costs that are not always transparently documented in standard calculators.

Workaround: Perform small-scale testing with images to establish a baseline token cost per image before scaling multimodal applications.

Source: reddit.comreddit · cited 2026-03-11

Tokenizer V3 Tool Calling Overhead

The transition from Tokenizer V2 to V3 changed the encoding of tool messages. In V3, tool results are no longer wrapped in a list, and the entire history of tool calls is tokenized, which can alter the total token count and associated costs for complex agentic workflows compared to older versions.

Workaround: Review the 'prompt_tokens' and 'completion_tokens' in API responses when upgrading to Tokenizer V3 to ensure cost estimates remain accurate for tool-heavy applications.

Source: docs.mistral.aivendor_docs · cited 2026-05-16

Hidden costs (25-40% beyond per-token rates)

Typical overhead: 25-40% beyond raw per-token rates.

What it costs to leave Mistral AI

Switching from Mistral is relatively straightforward due to their adherence to OpenAI-compatible API structures and the availability of open weights. The primary lock-in risks are specific tool-calling implementations in Tokenizer V3 and any deep integrations with Mistral-specific features like their native prompt caching blocks.

Who is this for?

For vibe coders & solo devs

Mistral is a favorite for the 'vibe coding' community due to the Vibe CLI and the low-cost Small 4 model. The $14.99 Le Chat Pro subscription is an excellent way to get high usage limits for research without worrying about per-token API spikes. However, be careful with the Vibe CLI as it may bypass your standard API spending limits set in the dashboard.

* Use Mistral Small 4 for rapid prototyping and code boilerplate generation.
* Monitor Vibe CLI usage separately to avoid unexpected monthly billing surprises.
* Leverage the 64-token prompt cache block by keeping your system prompts consistent.
* Apply for the Mistralship program if you are a seed-stage startup to get 30,000 credits.

For SMBs and growing teams

Small and medium businesses can leverage Mistral's transparent tier system to scale costs alongside growth. The automatic transition through usage tiers (Tier 1 to Tier 4) ensures that as your spend increases, your rate limits grow automatically. This makes Mistral a predictable partner for businesses that cannot commit to large upfront enterprise contracts.

* Start with the pay-as-you-go Tier 1 to test product-market fit with zero commitment.
* Use the Mistralship program to secure 30,000 credits if your company is less than 7 years old.
* Implement prompt caching for repetitive customer support queries to save 90% on input costs.
* Consider Le Chat Team subscriptions for internal staff to provide AI tools at a fixed monthly cost.

For enterprise buyers

For enterprise buyers, Mistral offers the flexibility of managed API access or private deployments on AWS, Azure, or Google Cloud. The Enterprise Plan, reportedly starting at $20,000 per month, provides the SLAs and administrative controls required for regulated industries. Multi-cloud availability ensures you can deploy Mistral models wherever your data currently resides.

* Negotiate volume discounts via the Enterprise Plan for commitments over $20,000 per month.
* Use AWS Bedrock or Azure AI Foundry to apply existing cloud credits toward Mistral usage.
* Utilize Provisioned Throughput on cloud providers to save up to 70% on predictable production loads.
* Ensure your security team reviews the SAML SSO and ACL permissions available in the Enterprise tier.
Need help deciding which Mistral AI tier or model fits your workload? Book a $19.99 quick call →

Sources verified for this page

Primary: Mistral AI pricing page

View all 28 cited insider sources across 19 domains

Generator: gen-v4.13-2026-05-15 · Last refreshed: Sat May 16 2026 18:11:06 GMT-0400 (Eastern Daylight Time) · Pricing snapshot: Sat May 16 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-05.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-05
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 578 days captured
Anthropic Docs
2026-06-05
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 578 days captured
OpenAI
2026-06-05
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 579 days captured
Google AI
2026-06-05
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 554 days captured
Google Vertex
2026-06-05
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 554 days captured
DeepSeek
2026-06-05
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 493 days captured
xAI
2026-06-05
https://x.ai/api
Daily snapshot since Nov 2024 · 411 days captured
Mistral
2026-06-05
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 552 days captured
Cohere
2026-06-05
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 578 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →