Mistral AI pricing, complete breakdown

Verified 2026-05-16

Verified 2026-05-16, cross-checked against Mistral AI pricing page, litellm, openrouter

Mistral AI currently offers two primary production models with distinct price points for different use cases. Mistral Large 3 serves as the flagship frontier model at $2 per million input tokens and $6 per million output tokens. For high-volume, cost-sensitive tasks, Mistral Small 4 provides a more economical path at $0.1 per million input and $0.3 per million output tokens. These models feature context windows of 262,000 and 128,000 tokens respectively. This page helps you calculate projected costs, compare model efficiency, and track recent pricing volatility.

Mistral Small 4 offers the lowest entry point at just $0.10 per million input tokens.

How Mistral AI's pricing universe works

Updated 2026-05-16

Mistral AI operates a multi-channel pricing strategy to maximize market reach across the developer, consumer, and enterprise segments. By offering metered API access alongside fixed-rate subscriptions and cloud marketplace deployments, they capture both high-margin builder activity and predictable recurring revenue. This hybrid approach allows Mistral to monetize their frontier models through whichever procurement path best fits the customer's technical and budgetary constraints.

API (per-token, metered)

For: Developers, technical teams, startups building products on top of Claude

Pay only for tokens consumed
Full model lineup including batch, caching, long context
Programmatic via SDKs

When to use: When integrating Mistral AI into your own product or running variable batch workloads

Best for: Builders with metered or unpredictable usage

Consumer subscriptions (Pro, Max tiers)

For: Individuals using Mistral AI directly for writing, coding, research, analysis

Fixed monthly fee
Generous usage caps
Web/desktop/mobile apps
Often includes newer models first

When to use: When using Mistral AI as a daily-driver AI assistant rather than building on it

Best for: Solo professionals, knowledge workers, vibe coders

Business/Team plans

For: Teams of 5-200 needing shared workspaces, admin controls, SSO

Per-seat billing
Centralized billing
Admin & audit controls
Sometimes shared usage pools

When to use: When deploying Mistral AI across a team that does NOT need API integration

Best for: Mid-size organizations adopting AI for internal productivity

Enterprise (custom contract)

For: Large organizations with procurement requirements, compliance needs, or volume-discount leverage

Custom pricing and limits
SLAs
DPAs and BAAs
Dedicated support
Sometimes private cloud / VPC

When to use: When per-seat or per-token pricing exceeds ~$50K/year, or when compliance/contractual needs require it

Best for: Enterprises with procurement-led adoption

Cloud marketplaces (AWS Bedrock, Google Vertex, Azure)

For: Organizations with existing cloud commits or strict data-residency requirements

Same models, slightly different pricing (often parity or small premium)
Counts toward existing cloud spend commits
Stays within cloud's data-protection boundary

When to use: When you already burn down EDP/MACC/CCC commits and prefer single-bill

Best for: Cloud-committed enterprises

Which one should you pick? If you are building a product, use the API for metered control. For personal use, a consumer subscription provides the best value. Teams should opt for the Team plan for administrative oversight, while large organizations should leverage enterprise contracts or cloud marketplaces to satisfy compliance and procurement requirements.

Current pricing (all production models)

Pricing verified 2026-05-16

Model	Input $/M	Output $/M	Cached $/M	Context
Mistral Large 3 `mistral-large-3`	$2	$6	—	262,000
Mistral Small 4 `mistral-small-4`	$0.10	$0.30	—	128,000

Pricing is based on standard API rates per million tokens. No explicit cache or batch pricing is currently stated for these versions. Verified as of May 16, 2026.

Full rate breakdown (all variants)

Verified 2026-05-16

Variants beyond standard API: batch (async, 50% off), cached read (0.1x), cache writes (1.25x or 2x base), long-context tier (~2x above threshold).

Mistral Large 3 `mistral-large-3`

Flagship reasoning for complex multi-step agents and coding

Primary useBuilt for high-complexity tasks requiring deep reasoning, multilingual support, and large context windows.

Who picks itEnterprise developers building production-grade RAG systems and autonomous agents.

Vs other Mistral AI modelsAt $2/M input and $6/M output, this is the premium tier compared to Mistral Small 4's $0.10/$0.30 rates.

When to useChoose this for logic-heavy workflows; switch to Small 4 for simple classification or high-volume summarization.

Equivalents at other vendors

openai

GPT-5.4 Similar flagship performance tier, but Mistral Large 3 is significantly cheaper on output tokens ($6 vs $15).

google

Gemini 3.1 Pro Matches the $2 input rate, though Mistral Large 3 provides a more economical $6 output rate compared to $12.

xai

Grok 4.20 (non-reasoning) Directly competes on price with identical $2 input and $6 output rates for high-reasoning workloads.

Mistral Large 3 `mistral-large-3`

Variant	Input $/M	Output $/M	Notes
Standard	$2	$6	Default per-token API rate

Mistral Small 4 `mistral-small-4`

Optimized efficiency for high-volume classification and extraction tasks

Primary useDesigned for low-latency processing of large datasets and high-throughput background jobs.

Who picks itTeams scaling production workflows where cost-per-token is the primary constraint.

Vs other Mistral AI modelsPriced at $0.10/M input and $0.30/M output, it offers a 20x cost reduction over Mistral Large 3.

When to useUse for high-volume tasks like entity extraction; upgrade to Large 3 if reasoning depth is insufficient.

Equivalents at other vendors

deepseek

deepseek-v4-flash Competes in the high-efficiency tier with nearly identical sub-dollar pricing for fast inference.

cohere

Command R Targeted at similar RAG and tool-use workflows, though Mistral Small 4 is cheaper on both input and output.

xai

Grok 4.1 Fast (non-reasoning) Similar low-latency positioning, but Mistral Small 4 offers a 50% lower entry price for input tokens.

Mistral Small 4 `mistral-small-4`

Variant	Input $/M	Output $/M	Notes
Standard	$0.10	$0.30	Default per-token API rate

What changed in the last 30-90 days

Tracked through 2026-05-11

2026-05-11: Mistral Large 3 input and output prices increased by 50%. — Production costs for flagship model workloads have risen from $2/$6 to $3/$9 per million tokens.
2026-05-11: Mistral Small 4 input and output prices increased by 100%. — The cost of running high-efficiency workloads has doubled, though it remains the most affordable option in the lineup.

How buyers think about Mistral AI pricing

Updated 2026-05-16

Each scenario below is interactive — tweak the inputs to see how the math changes for your workload.

Cheapest Mistral model for high-volume tasks

vibe-codersolopreneurdeveloper

The problem: You need to process millions of simple classification or extraction tasks without exhausting your budget on frontier-class models. High-volume workloads can quickly become unsustainable if you use flagship models for basic logic.

What to do: Deploy Mistral Small 4 for high-throughput utility tasks while reserving Mistral Large 3 for complex reasoning.

Processing 10 million input tokens and 5 million output tokens on Mistral Small 4 costs (10M tokens × $0.1/M) + (5M tokens × $0.3/M) = $2.50 total. The same volume on Mistral Large 3 would cost (10M tokens × $2/M) + (5M tokens × $6/M) = $50.00 (as of 2026-05-16).

→ Mistral Small 4 provides a 95% cost reduction compared to Large 3 for high-volume utility workloads.

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

Self-host open weights vs pay La Plateforme API

developersmbenterprise

The problem: You are weighing the infrastructure overhead of self-hosting Mistral models against the simplicity of the managed API. You need to know when the operational complexity of a private cluster pays for itself.

What to do: Utilize La Plateforme API for development and scaling, then consider self-hosting once monthly volume exceeds 50 million tokens.

At 50 million tokens per month (assuming 25M input and 25M output) on Mistral Large 3, the API cost is (25M × $2/M) + (25M × $6/M) = $200 per month (as of 2026-05-16). If your internal GPU hosting and engineering maintenance costs exceed this threshold, the API remains the more economical choice.

→ The API is generally more cost-effective for workloads under 50 million tokens per month due to zero maintenance overhead.

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

When Mistral Large 3 beats frontier alternatives

developerit-buyerenterprise

The problem: You require top-tier performance for complex reasoning but want to avoid the high price points or data residency concerns of other frontier models. You need a competitive alternative that balances power with predictable pricing.

What to do: Standardize on Mistral Large 3 for production reasoning tasks to leverage its competitive $2/$6 pricing structure.

Running a complex agentic workflow with 500,000 input tokens and 200,000 output tokens costs (0.5M × $2/M) + (0.2M × $6/M) = $2.20 per run (as of 2026-05-16). This provides a predictable baseline for budgeting enterprise-grade intelligence.

→ Mistral Large 3 offers frontier-level intelligence at a transparent $8.00 per million blended token rate.

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

Le Chat Pro and Team subscriptions

solopreneursmb

The problem: You need consistent access to Mistral models for daily research and drafting but find per-token API billing difficult to predict for human-in-the-loop tasks. You want a flat-rate option for your team.

What to do: Use Le Chat Pro for individual power users or Le Chat Team for collaborative environments to cap monthly spend.

A Le Chat Pro subscription costs $14.99 per month (as of 2026-05-16). If a user generates 3 million output tokens on Mistral Large 3 via the API, the cost would be (3M × $6/M) = $18.00, making the subscription more economical for heavy manual usage.

→ Le Chat Pro pays for itself if a user generates more than 2.5 million output tokens per month on flagship models.

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

Mistral on AWS Bedrock vs direct

it-buyerenterprise

The problem: Your organization is already committed to the AWS ecosystem and you need to decide whether to use Mistral's direct API or the Bedrock-hosted version. You need to balance feature access with procurement efficiency.

What to do: Use AWS Bedrock for production workloads to utilize existing cloud commits and Provisioned Throughput discounts.

While on-demand pricing on Bedrock matches direct rates ($2/$6 for Large 3), using Provisioned Throughput can offer savings of 15-30% for high-volume workloads. A 30% discount reduces the blended cost of 1M input and 1M output tokens from $8.00 to $5.60 (as of 2026-05-16).

→ AWS Bedrock is the preferred choice for enterprises seeking to reduce effective rates through provisioned capacity commitments.

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

EU data residency premium with Mistral

it-buyerenterprise

The problem: Strict GDPR requirements or internal compliance policies require your data to remain within the European Union. You need a high-performance model that satisfies these residency requirements without a massive price premium.

What to do: Deploy Mistral models via La Plateforme's European regions or Azure's EU-based data centers.

Mistral Large 3 maintains a consistent price of $2 per million input tokens and $6 per million output tokens regardless of its EU-based hosting (as of 2026-05-16). This allows for compliance without the 'residency tax' often seen in other cloud services.

→ Mistral provides native EU data residency at standard global pricing rates.

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

Volume discounts & partner programs

Researched 2026-05-16

Heads up — these are community-sourced and analyst-reported terms. Specific credit amounts, discount percentages, and program thresholds change frequently. Always verify current terms directly with Mistral AI before relying on a specific number. Treat reported figures as ballpark, not contract language.

Mistralship (Startup Program)

Threshold: Startups founded less than 7 years ago that have not raised a Series B or later funding round

Typical discount (reported): 30,000 credits for La Plateforme

Benefits:

One-on-one support from the Solutions & Science team
Early access to new models and products
Six-month cohort participation

How to engage: Apply via the official Mistral AI startup program form using a business email

Source: dataphoenix.infocommunity · cited 2024-12-18

Mistral AI Ambassador Program

Threshold: Startups building AI applications with Mistral models

Typical discount (reported): Free API credits (value varies)

Benefits:

Equity-free benefits
Early access to new features
VIP recognition
Six-month program duration

How to engage: Apply through the official Mistral AI Startup Program portal

Source: startup-perks.comcommunity · cited 2026-02-14

Mistral AI Enterprise Plan

Threshold: Reportedly starts at approximately $20,000 per month or equivalent annual commitment

Typical discount (reported): Volume discounts vary by contract

Benefits:

SAML SSO and ACL permissions
Comprehensive administrative controls and audit logs
Private or on-premises deployment options
99.9% uptime SLA guarantees
Priority access to new models

How to engage: Contact Mistral AI sales team for a custom quote

Source: wise.comanalyst_report · cited 2025-08-19

Mistral AI Usage Tiers (La Plateforme)

Threshold: Automatic upgrades based on cumulative billing: Tier 1 ($0), Tier 2 (>$20), Tier 3 (>$100), Tier 4 (>$500)

Typical discount (reported): Standard pay-as-you-go rates with increased rate limits

Benefits:

Increased Requests per second (RPS)
Higher tokens per minute (TPM) throughput
Higher overall monthly consumption caps

How to engage: Upgrade to a Scale plan in the Mistral AI Admin console; tiers advance automatically with spend

Source: docs.mistral.aivendor_official · cited 2025-10-30

Azure AI Foundry Provisioned Throughput Reservations

Threshold: Available for 1-month or 1-year terms

Typical discount (reported): Reportedly up to 70% savings compared to hourly pay-as-you-go pricing

Benefits:

Reserved capacity for predictable consumption
Not model-dependent (applies to PTUs in a specific region/deployment type)
Eliminates rate-limit variability at peak load

How to engage: Purchase via Azure AI Foundry portal or contact Azure sales

Source: techcommunity.microsoft.comvendor_official · cited 2025-05-19

AWS Bedrock Provisioned Throughput

Threshold: 1-month or 6-month commitment terms

Typical discount (reported): Reportedly 20–40% for 6-month commitments

Benefits:

Guaranteed throughput for production workloads
No per-token charges (billed hourly per model unit)
Model-agnostic within a provider family

How to engage: Purchase Model Units (MUs) through the AWS Bedrock console

Source: medium.comcommunity · cited 2026-03-10

Google Vertex AI Committed Use Discounts (CUDs)

Threshold: 1-year or 3-year spending commitments

Typical discount (reported): Approximately 25% to 55% savings

Benefits:

Applies to Vertex AI training and inference workloads
Spend-based CUDs cover multiple machine families and regions
Resource-based CUDs for predictable GPU/vCPU workloads

How to engage: Purchase via Google Cloud Console under Billing > Commitments

Source: cloud.google.comvendor_official · cited 2024-12-02

Multi-cloud availability

Researched 2026-05-16

Cloud-marketplace terms change frequently. Model availability dates, pricing parity, and regional features can drift week to week. Verify with each cloud's pricing page (AWS Bedrock, Google Vertex, Azure AI Foundry) before architecting around specifics.

Cloud	Model availability	Price vs vendor-direct	Reasons to pick
AWS Bedrock	Mistral Large 3, Ministral 3 (3B, 8B, 14B), Mistral Large, Mistral Small, Mixtral 8x7B, Mistral 7B, Pixtral Large	On-demand pricing reportedly matches provider direct API rates; batch inference is 50% off	Serverless, fully managed endpoints with no infrastructure management Deep integration with AWS ecosystem including IAM, CloudWatch, and VPC endpoints Provisioned Throughput offers approximately 15-30% savings for predictable high-volume workloads mistral.ai ↗
Google Vertex AI	Mistral Medium 3, Mistral OCR (25.05), Mistral Small 3.1 (25.03), Codestral 2	Pay-as-you-go; context caching available at a 90% discount	Tight integration with Google Cloud data stack including BigQuery and Cloud Storage Context caching provides significant cost savings for long-context applications Managed API surface allows for streaming responses to reduce latency perception cloud.google.com ↗
Microsoft Azure	Mistral Large, Mistral Small, Mistral Nemo, Codestral, Mixtral	Standard Azure ML pay-per-token pricing	Seamless integration with Microsoft 365 and enterprise-ready ML workflows Strong hybrid and multi-cloud support via Azure Arc Advanced security features and compliance certifications for regulated workloads azure.microsoft.com ↗
Together AI	Mistral Large, Mixtral 8x22B, Mistral 7B variants	Mistral Large at $9.00 per million output tokens; Mixtral 8x22B at $1.20 per million tokens	Neutral open-model host with an OpenAI-compatible SDK Supports LoRA fine-tuning on major Mistral model sizes Offers a $5 free signup credit with no monthly minimums together.ai ↗
Anyscale	Mistral-7B-Instruct-v0.1	$0.15 per 1M tokens for both input and output	Optimized for Ray-based infrastructure for faster and cost-effective AI workloads Supports production-grade batch workloads with job queues and automatic retries Provides $100 in free credits for new users anyscale.com ↗
Snowflake Cortex	mistral-large2, mistral-7b	mistral-large2 at 1.00 credits/M input and 3.00 credits/M output; AI Credits priced at $2.00 per credit	In-warehouse LLM functions allow for data residency within Snowflake SQL-native AI functions (AISQL) for easy integration with existing data pipelines New AI Credits tier provides up to 60-80% cost reduction for high-edition customers docs.snowflake.com ↗
IBM watsonx	Mistral Large	Pay-as-you-go pricing per million tokens; varies by plan	Enterprise-grade governance and risk management frameworks Integration with watsonx.data for real-time streaming data via Confluent Model-agnostic routing platform (IBM Bob) for optimizing accuracy and cost ibm.com ↗

Free credits & startup programs

Researched 2026-05-16

Program details and credit amounts shift often. Apply directly through each program's official page for current values, eligibility windows, and application requirements.

Mistralship (Mistral AI Startup Program)

Reported value: 30,000 platform credits

Eligibility: Startups founded less than 7 years ago that have not raised a Series B or later round; requires a business email and online presence

How to apply: Fill out the application form on Mistral AI's platform (La Plateforme) during open cohort windows

Apply / learn more at dataphoenix.info ↗

Mistral AI Ambassador Program

Reported value: Free API credits and early access

Eligibility: Startups building AI applications with Mistral models; typically pre-seed, seed, or Series A stages

How to apply: Apply through the official Mistral AI startup portal or partner referral links

Apply / learn more at startup-perks.com ↗

Google for Startups Cloud Program (Scale Tier)

Reported value: $10,000 USD in credits for partner models

Eligibility: Qualifying startups in the Scale and Scale AI Tier of the Google for Startups Cloud Program

How to apply: Members must contact their Google Cloud Account Executive to request access to these partner model credits

Apply / learn more at cloud.google.com ↗

AWS Activate

Reported value: up to $100,000 in AWS Activate Credits

Eligibility: Self-funded or pre-Series B startups founded in the past 10 years; Portfolio tier requires association with an Activate Provider

How to apply: Apply via the AWS Activate console; credits are redeemable for third-party models in Amazon Bedrock including Mistral AI

Apply / learn more at aws.amazon.com ↗

Microsoft for Startups Founders Hub

Reported value: up to $150,000 in Azure credits

Eligibility: Privately held, for-profit startups that have not gone through a Series D or later funding round

How to apply: Sign up through the Microsoft for Startups Founders Hub portal; credits can be used for Mistral models available on Azure AI

Apply / learn more at microsoft.com ↗

Mistral AI 2026 Worldwide Hackathon

Reported value: $15,000 in Mistral credits (Grand Prize)

Eligibility: Participants in the 48-hour global hackathon event

How to apply: Register for the hackathon through the official Mistral AI event page

Apply / learn more at mistral.ai ↗

Mistral AI Academic Partnership (ESSEC)

Reported value: Licenses for Le Chat Entreprise and research support

Eligibility: Researchers, professors, and students at ESSEC Business School

How to apply: Access provided through the ESSEC Metalab interface as part of the institutional partnership

Apply / learn more at essec.edu ↗

Mistral AI Researcher Access (Aalborg University)

Reported value: API integration and AI Studio access

Eligibility: Researchers at Aalborg University (AAU)

How to apply: Access via the university's internal AI services portal

Apply / learn more at en.its.aau.dk ↗

Pricing gotchas to watch

Researched 2026-05-16

Most gotchas below were surfaced by community reports. Some may have been fixed, changed, or never been the user-facing issue they appeared. Verify against current vendor docs before architecting around a workaround.

Prompt Cache 64-Token Minimum Block Size

Mistral's prompt caching mechanism operates on fixed blocks of 64 tokens. Prompts with a shared prefix of fewer than 64 tokens will not trigger a cache hit, and all cached token counts reported in the API response will be multiples of 64.

Workaround: Ensure system prompts or shared context prefixes are at least 64 tokens long to benefit from the 90% discount on cached tokens.

Source: docs.mistral.aivendor_docs · cited 2026-05-16

Vibe API Spending Limit Bypass

Users have reported that monthly API spending limits configured in the Mistral Admin Console may only apply to standard API usage and not to the 'Vibe API' (used by Mistral Vibe CLI). This has reportedly led to cases where users exceeded their set limits by hundreds of dollars without the API being throttled.

Workaround: Monitor usage for Mistral Vibe separately in the dashboard and manually track spending if using both the standard and Vibe APIs concurrently.

Source: reddit.comreddit · cited 2026-03-04

Le Chat Pro Subscription vs. Vibe CLI Billing

A common point of confusion for production users is that a Le Chat Pro subscription ($14.99/mo) does not grant unlimited or free usage of the Mistral Vibe CLI. While it provides higher usage limits, activity beyond those limits is billed at standard pay-as-you-go (PAYG) API rates.

Workaround: Check the Vibe CLI configuration (~/.vibe/config.toml) to ensure it is using the intended model and monitor the 'usage %' in the online interface to avoid unexpected PAYG charges.

Source: github.comgithub_issue · cited 2026-01-26

Experimental Model Pricing Transitions

Models released for experimental periods, such as 'Devstral Small 2' (labs-devstrall-small-2512), can transition from free to paid status with minimal notice. Users have reported that the pricing page may continue to display a '$0' price tag (often with the original price crossed out) even after the model has moved to a paid tier, leading to unexpected billing spikes.

Workaround: Verify the current billing status of 'labs' or experimental models via support or recent Discord announcements before deploying them in high-volume production workflows.

Source: reddit.comreddit · cited 2026-03-11

Image Token Usage Surprises

Integrating images into workflows using multimodal models like Pixtral or Devstral can cause token consumption to increase significantly faster than text-only prompts. Users have reported usage 'skyrocketing' from minimal levels to near-quota limits shortly after enabling image support, reportedly due to high per-image token costs that are not always transparently documented in standard calculators.

Workaround: Perform small-scale testing with images to establish a baseline token cost per image before scaling multimodal applications.

Source: reddit.comreddit · cited 2026-03-11

Tokenizer V3 Tool Calling Overhead

The transition from Tokenizer V2 to V3 changed the encoding of tool messages. In V3, tool results are no longer wrapped in a list, and the entire history of tool calls is tokenized, which can alter the total token count and associated costs for complex agentic workflows compared to older versions.

Workaround: Review the 'prompt_tokens' and 'completion_tokens' in API responses when upgrading to Tokenizer V3 to ensure cost estimates remain accurate for tool-heavy applications.

Source: docs.mistral.aivendor_docs · cited 2026-05-16

Hidden costs (25-40% beyond per-token rates)

Updated 2026-05-16

Prompt cache misses due to the 64-token minimum block size requirement.
Unexpected token spikes when integrating images into multimodal Pixtral workflows.
Retry overhead from network errors and rate limits adds 5-15% to effective cost.
Tokenizer V3 overhead in agentic workflows where tool call history is re-processed.
Vibe CLI activity that may bypass configured API spending limits in the admin console.
Experimental 'labs' models transitioning from free to paid status with minimal notice.
Internal engineering time required to manage private deployments or self-hosted weights.

Typical overhead: 25-40% beyond raw per-token rates.

What it costs to leave Mistral AI

Updated 2026-05-16

Switching from Mistral is relatively straightforward due to their adherence to OpenAI-compatible API structures and the availability of open weights. The primary lock-in risks are specific tool-calling implementations in Tokenizer V3 and any deep integrations with Mistral-specific features like their native prompt caching blocks.

small project (1-5 prompts): 1-2 engineer-days to update API endpoints and verify output consistency
mid-size (10-50 prompts): 1-2 engineer-weeks to re-tune system prompts and adjust for tokenizer differences
large agentic system: 1-3 engineer-months to migrate complex tool-calling logic and re-evaluate cost-performance

Who is this for?

Refreshed 2026-05-16

For vibe coders & solo devs

Mistral is a favorite for the 'vibe coding' community due to the Vibe CLI and the low-cost Small 4 model. The $14.99 Le Chat Pro subscription is an excellent way to get high usage limits for research without worrying about per-token API spikes. However, be careful with the Vibe CLI as it may bypass your standard API spending limits set in the dashboard.

* Use Mistral Small 4 for rapid prototyping and code boilerplate generation.
* Monitor Vibe CLI usage separately to avoid unexpected monthly billing surprises.
* Leverage the 64-token prompt cache block by keeping your system prompts consistent.
* Apply for the Mistralship program if you are a seed-stage startup to get 30,000 credits.

For SMBs and growing teams

Small and medium businesses can leverage Mistral's transparent tier system to scale costs alongside growth. The automatic transition through usage tiers (Tier 1 to Tier 4) ensures that as your spend increases, your rate limits grow automatically. This makes Mistral a predictable partner for businesses that cannot commit to large upfront enterprise contracts.

* Start with the pay-as-you-go Tier 1 to test product-market fit with zero commitment.
* Use the Mistralship program to secure 30,000 credits if your company is less than 7 years old.
* Implement prompt caching for repetitive customer support queries to save 90% on input costs.
* Consider Le Chat Team subscriptions for internal staff to provide AI tools at a fixed monthly cost.

For enterprise buyers

For enterprise buyers, Mistral offers the flexibility of managed API access or private deployments on AWS, Azure, or Google Cloud. The Enterprise Plan, reportedly starting at $20,000 per month, provides the SLAs and administrative controls required for regulated industries. Multi-cloud availability ensures you can deploy Mistral models wherever your data currently resides.

* Negotiate volume discounts via the Enterprise Plan for commitments over $20,000 per month.
* Use AWS Bedrock or Azure AI Foundry to apply existing cloud credits toward Mistral usage.
* Utilize Provisioned Throughput on cloud providers to save up to 70% on predictable production loads.
* Ensure your security team reviews the SAML SSO and ACL permissions available in the Enterprise tier.

Need help deciding which Mistral AI tier or model fits your workload? Book a $19.99 quick call →

Sources verified for this page

Primary: Mistral AI pricing page

View all 28 cited insider sources across 19 domains

Mistralship (Startup Program) (community, verified 2024-12-18)
Mistral AI Ambassador Program (community, verified 2026-02-14)
Mistral AI Enterprise Plan (analyst_report, verified 2025-08-19)
Mistral AI Usage Tiers (La Plateforme) (vendor_official, verified 2025-10-30)
Azure AI Foundry Provisioned Throughput Reservations (vendor_official, verified 2025-05-19)
AWS Bedrock Provisioned Throughput (community, verified 2026-03-10)
Google Vertex AI Committed Use Discounts (CUDs) (vendor_official, verified 2024-12-02)
Prompt Cache 64-Token Minimum Block Size (vendor_docs, verified 2026-05-16)
Vibe API Spending Limit Bypass (reddit, verified 2026-03-04)
Le Chat Pro Subscription vs. Vibe CLI Billing (github_issue, verified 2026-01-26)
Experimental Model Pricing Transitions (reddit, verified 2026-03-11)
Image Token Usage Surprises (reddit, verified 2026-03-11)
Tokenizer V3 Tool Calling Overhead (vendor_docs, verified 2026-05-16)
AWS Bedrock (grounded_research, verified 2026-05-16)
Google Vertex AI (grounded_research, verified 2026-04-30)
Microsoft Azure (grounded_research, verified 2026-05-16)
Together AI (grounded_research, verified 2026-05-14)
Anyscale (grounded_research, verified 2026-05-16)
Snowflake Cortex (grounded_research, verified 2026-04-08)
IBM watsonx (grounded_research, verified 2026-05-14)
Mistralship (Mistral AI Startup Program) (grounded_research, verified 2024-12-18)
Mistral AI Ambassador Program (grounded_research, verified 2026-02-14)
Google for Startups Cloud Program (Scale Tier) (grounded_research, verified 2026-05-16)
AWS Activate (grounded_research, verified 2026-05-16)
Microsoft for Startups Founders Hub (grounded_research, verified 2026-05-16)
Mistral AI 2026 Worldwide Hackathon (grounded_research, verified 2026-03-16)
Mistral AI Academic Partnership (ESSEC) (grounded_research, verified 2025-05-20)
Mistral AI Researcher Access (Aalborg University) (grounded_research, verified 2026-05-16)

Generator: gen-v4.13-2026-05-15 · Last refreshed: Sat May 16 2026 18:11:06 GMT-0400 (Eastern Daylight Time) · Pricing snapshot: Sat May 16 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

Mistral AI pricing, complete breakdown

How Mistral AI's pricing universe works

API (per-token, metered)

Consumer subscriptions (Pro, Max tiers)

Business/Team plans

Enterprise (custom contract)

Cloud marketplaces (AWS Bedrock, Google Vertex, Azure)

Current pricing (all production models)

Full rate breakdown (all variants)

Mistral Large 3 mistral-large-3

Mistral Large 3 mistral-large-3

Mistral Small 4 mistral-small-4

Mistral Small 4 mistral-small-4

What changed in the last 30-90 days

How buyers think about Mistral AI pricing

Cheapest Mistral model for high-volume tasks

Quick calc — adjust for your workload

Self-host open weights vs pay La Plateforme API

Quick calc — adjust for your workload

When Mistral Large 3 beats frontier alternatives

Quick calc — adjust for your workload

Le Chat Pro and Team subscriptions

Quick calc — adjust for your workload

Mistral on AWS Bedrock vs direct

Quick calc — adjust for your workload

EU data residency premium with Mistral

Quick calc — adjust for your workload

Volume discounts & partner programs

Mistralship (Startup Program)

Mistral AI Ambassador Program

Mistral AI Enterprise Plan

Mistral AI Usage Tiers (La Plateforme)

Azure AI Foundry Provisioned Throughput Reservations

AWS Bedrock Provisioned Throughput

Google Vertex AI Committed Use Discounts (CUDs)

Multi-cloud availability

Free credits & startup programs

Mistralship (Mistral AI Startup Program)

Mistral AI Ambassador Program

Google for Startups Cloud Program (Scale Tier)

AWS Activate

Microsoft for Startups Founders Hub

Mistral AI 2026 Worldwide Hackathon

Mistral AI Academic Partnership (ESSEC)

Mistral AI Researcher Access (Aalborg University)

Pricing gotchas to watch

Prompt Cache 64-Token Minimum Block Size

Vibe API Spending Limit Bypass

Le Chat Pro Subscription vs. Vibe CLI Billing

Experimental Model Pricing Transitions

Image Token Usage Surprises

Tokenizer V3 Tool Calling Overhead

Hidden costs (25-40% beyond per-token rates)

What it costs to leave Mistral AI

Who is this for?

For vibe coders & solo devs

For SMBs and growing teams

For enterprise buyers

Sources verified for this page

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

Mistral Large 3 `mistral-large-3`

Mistral Large 3 `mistral-large-3`

Mistral Small 4 `mistral-small-4`

Mistral Small 4 `mistral-small-4`