DeepSeek pricing, complete breakdown

Verified 2026-05-16, cross-checked against DeepSeek pricing page, litellm, openrouter

DeepSeek provides a highly competitive pricing structure across its latest V4 and V3.2 model families. The flagship deepseek-v4-pro offers a massive 1-million token context window at $0.435 per million input tokens and $0.87 per million output tokens. For efficiency-focused workloads, deepseek-v4-flash reduces costs to $0.14 per million input and $0.28 per million output. The V3.2 series, including both chat and reasoner models, maintains a steady rate of $0.28 per million input and $0.42 per million output. This guide helps you navigate these rates and understand the economic trade-offs between different integration paths.

DeepSeek V4 Flash offers the lowest entry point at just $0.14 per million input tokens.

How DeepSeek's pricing universe works

DeepSeek utilizes multiple pricing modes to capture different segments of the AI market, balancing high-margin API revenue with predictable subscription income. API per-token pricing allows developers to scale costs directly with usage, while consumer and business subscriptions provide fixed-cost access for end-users who need a daily AI assistant. This dual-track strategy helps DeepSeek fund massive compute requirements while maintaining a low barrier to entry for builders. Cloud marketplace integrations further simplify procurement for large enterprises that prefer to consolidate billing under existing infrastructure contracts.

API (per-token, metered)

For: Developers, technical teams, startups building products on top of DeepSeek
  • Pay only for tokens consumed
  • Full model lineup including V4 Pro and Flash
  • Programmatic access via OpenAI-compatible SDKs
When to use: When integrating DeepSeek into your own product or running variable batch workloads
Best for: Builders with metered or unpredictable usage

Consumer subscriptions (Pro, Max tiers)

For: Individuals using DeepSeek directly for writing, coding, research, analysis
  • Fixed monthly fee
  • Generous usage caps
  • Web/desktop/mobile apps
  • Priority access during peak traffic
When to use: When using DeepSeek as a daily-driver AI assistant rather than building on it
Best for: Solo professionals, knowledge workers, vibe coders

Business/Team plans

For: Teams of 5-200 needing shared workspaces, admin controls, SSO
  • Per-seat billing
  • Centralized billing
  • Admin & audit controls
  • Shared usage pools
When to use: When deploying DeepSeek across a team that does NOT need API integration
Best for: Mid-size organizations adopting AI for internal productivity

Enterprise (custom contract)

For: Large organizations with procurement requirements, compliance needs, or volume-discount leverage
  • Custom pricing and limits
  • SLAs
  • DPAs and BAAs
  • Dedicated support
When to use: When per-seat or per-token pricing exceeds ~$50K/year, or when compliance/contractual needs require it
Best for: Enterprises with procurement-led adoption

Cloud marketplaces (AWS Bedrock, Google Vertex, Azure)

For: Organizations with existing cloud commits or strict data-residency requirements
  • Same models, slightly different pricing (often parity or small premium)
  • Counts toward existing cloud spend commits
  • Stays within cloud's data-protection boundary
When to use: When you already burn down EDP/MACC/CCC commits and prefer single-bill
Best for: Cloud-committed enterprises
Which one should you pick? If you are building a product or automated workflow, the API is your primary path. For personal use as a chat assistant, a Pro subscription is the most cost-effective. Teams should look toward the Team plan for shared management, while large-scale enterprise adoption is best handled through custom contracts or cloud marketplace deployments to leverage existing infrastructure spend.

Current pricing (all production models)

ModelInput $/MOutput $/MCached $/MContext
DeepSeek V3.2 (chat)
deepseek-v3-2
$0.28 $0.42 $0.028 128,000
DeepSeek V3.2 (reasoner)
deepseek-reasoner
$0.28 $0.42 $0.028 128,000
deepseek-v4-pro
deepseek-v4-pro
$0.43 $0.87 $0.004 1,000,000
deepseek-v4-flash
deepseek-v4-flash
$0.14 $0.28 $0.003 1,000,000

Pricing verified as of 2026-05-16. DeepSeek offers significant discounts for cached input tokens, which are applied automatically when context is reused.

Full rate breakdown (all variants)

Variants beyond standard API: batch (async, 50% off), cached read (0.1x), cache writes (1.25x or 2x base), long-context tier (~2x above threshold).

DeepSeek V3.2 (chat) deepseek-v3-2

Efficient general-purpose chat for standard conversational workflows
Primary useStandard conversational AI, general task automation, and customer-facing chat interfaces.
Who picks itDevelopers building high-speed chat applications and general productivity tools.
Vs other DeepSeek modelsAt $0.28 per million input tokens, it matches the Reasoner's price but is optimized for conversational speed rather than deep logic.
When to useUse this for standard dialogue and low-latency responses; switch to V4-Pro for tasks requiring a 1M token context window.
Equivalents at other vendors
openai
GPT-5.4 Nano Both target high-speed, low-cost chat, though DeepSeek offers a significantly larger 128k context window.
xai
Grok 4.1 Fast (non-reasoning) Directly comparable in the high-speed utility tier with similar sub-$0.50 per million token pricing.

DeepSeek V3.2 (chat) deepseek-v3-2

VariantInput $/MOutput $/MNotes
Standard $0.28 $0.42 Default per-token API rate
Cached read $0.028 $0.42 Cached prompt input (~0.1x base); output rate unchanged

DeepSeek V3.2 (reasoner) deepseek-reasoner

Logic-focused inference for complex math and technical coding
Primary useMulti-step reasoning, mathematical proofs, and complex software engineering tasks.
Who picks itEngineers building technical agents and automated reasoning systems requiring chain-of-thought processing.
Vs other DeepSeek modelsShares the $0.28/$0.42 rate with V3.2 Chat but prioritizes logical depth and step-by-step verification over chat fluidity.
When to useChoose this for debugging and logic-heavy tasks; use V4-Flash for high-volume summarization where reasoning is less critical.
Equivalents at other vendors
xai
Grok 4.1 Fast (reasoning) Both models target the emerging 'fast reasoning' niche, providing logical depth at a fraction of flagship costs.
mistral
Mistral Small 4 Matches the efficiency-first profile for structured logic tasks while maintaining a competitive price point.

DeepSeek V3.2 (reasoner) deepseek-reasoner

VariantInput $/MOutput $/MNotes
Standard $0.28 $0.42 Default per-token API rate
Cached read $0.028 $0.42 Cached prompt input (~0.1x base); output rate unchanged

deepseek-v4-pro deepseek-v4-pro

Massive context window for enterprise-scale document analysis
Primary useLong-context analysis of entire codebases, legal archives, and multi-document synthesis.
Who picks itEnterprise teams managing large-scale RAG systems and complex document processing pipelines.
Vs other DeepSeek modelsPriced at $0.435/$0.87, it is DeepSeek's premium offering, providing a 1M token context window compared to the 128k in V3.2.
When to useBest for massive inputs and high-fidelity outputs; use V4-Flash if you need the 1M context but can sacrifice some output quality for lower cost.
Equivalents at other vendors
openai
GPT-5.4 Mini Similar 'pro-lite' performance tier, though DeepSeek provides a much larger context window at a lower price point.
google
Gemini 3.1 Flash-Lite Competes directly on high-context efficiency and low-cost processing for large datasets.

deepseek-v4-pro deepseek-v4-pro

VariantInput $/MOutput $/MNotes
Standard $0.43 $0.87 Default per-token API rate
Cached read $0.004 $0.87 Cached prompt input (~0.1x base); output rate unchanged

deepseek-v4-flash deepseek-v4-flash

Ultra-low latency processing for high-volume data streams
Primary useHigh-throughput classification, real-time summarization, and massive data extraction tasks.
Who picks itDevelopers scaling high-volume pipelines where cost-per-token and context size are the primary constraints.
Vs other DeepSeek modelsThe most economical option at $0.14/$0.28, offering the same 1M context as V4-Pro but at roughly half the price.
When to useUse for volume and speed; upgrade to V4-Pro if the task requires higher reasoning depth or more nuanced output.
Equivalents at other vendors
cohere
Command R Both models focus on high-efficiency RAG and document processing with aggressive pricing for volume.
mistral
Mistral Small 4 Matches the high-efficiency, low-cost profile required for simple, high-volume automated tasks.

deepseek-v4-flash deepseek-v4-flash

VariantInput $/MOutput $/MNotes
Standard $0.14 $0.28 Default per-token API rate
Cached read $0.003 $0.28 Cached prompt input (~0.1x base); output rate unchanged

How buyers think about DeepSeek pricing

Each scenario below is interactive — tweak the inputs to see how the math changes for your workload.

Bulk processing at DeepSeek ultra-low rates

vibe-codersolopreneurdeveloper

The problem: You need to process millions of documents or logs but cannot justify the high per-token costs of frontier models. High-volume summarization or data extraction often becomes cost-prohibitive at scale.

What to do: DeepSeek V4 Flash is recommended for its aggressive pricing and high performance on routine tasks.

Processing 1 million input tokens and 1 million output tokens costs $0.42 total. Specifically, 1M input tokens at $0.14/M and 1M output tokens at $0.28/M equals $0.42 per batch, meaning 100 million tokens of total throughput costs only $21.00 (as of 2026-05-16).

→ You can process 10 million tokens for less than $5.00 using V4 Flash.

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

DeepSeek reasoner vs OpenAI o-series for math and code

developersmbenterprise

The problem: Complex logic, mathematical proofs, and deep code debugging require chain-of-thought reasoning that standard chat models often fail to execute reliably. You need high-quality reasoning without the premium price tag of other reasoning models.

What to do: DeepSeek V3.2 (reasoner) provides deep thinking capabilities at the same price point as their standard chat model.

A complex request using 5,000 input tokens and generating 5,000 output tokens (including internal reasoning) costs $0.0035. This is calculated as 5,000 tokens × $0.28/M input ($0.0014) plus 5,000 tokens × $0.42/M output ($0.0021) for a total of $0.0035 per request (as of 2026-05-16).

→ DeepSeek Reasoner delivers advanced logic for $0.70 per million balanced token pairs.

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

When V4 Pro is worth the upgrade from V3.2

developersmbenterprise

The problem: You are currently using V3.2 for general chat but find that certain agentic workflows or long-context RAG tasks require higher instruction-following accuracy and a larger context window.

What to do: DeepSeek V4 Pro is the recommended upgrade for workloads requiring a 1,000,000 token context window and higher precision.

Moving a workload of 1 million input tokens and 1 million output tokens from V3.2 to V4 Pro increases the cost from $0.70 to $1.305. The V4 Pro math is 1M input × $0.435/M ($0.435) plus 1M output × $0.87/M ($0.87) for a total of $1.305 (as of 2026-05-16).

→ Upgrading to V4 Pro roughly doubles your cost while expanding the context window by nearly eight times.

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

DeepSeek off-peak discount window

developerit-buyer

The problem: Your organization runs large asynchronous batch jobs, such as nightly data indexing or content moderation, and you want to minimize the impact on your monthly AI budget.

What to do: Schedule non-urgent batch and async jobs during DeepSeek's official off-peak hours to take advantage of substantial rate drops.

While standard rates for V3.2 are $0.28/M input and $0.42/M output, scheduling these during UTC off-peak windows reduces the effective monthly spend. For a workload of 100 million tokens, even a partial discount on these rates provides significant savings over on-demand peak usage (as of 2026-05-16).

→ Shifting async workloads to off-peak hours is the most effective way to lower your blended token rate.

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

Compliance considerations for DeepSeek API

it-buyerenterprise

The problem: Enterprise security policies may restrict the use of direct APIs from certain jurisdictions. You need the performance of DeepSeek models but must maintain data residency within specific cloud boundaries like AWS or Azure.

What to do: Deploy DeepSeek models via Amazon Bedrock or Azure AI Foundry to ensure VPC-native security and compliance.

Using DeepSeek V3.2 on Amazon Bedrock costs $0.62 per 1M input tokens, compared to the direct API price of $0.28/M. For an enterprise consuming 500 million input tokens monthly, the cloud-managed cost is $310.00, providing integrated billing and AWS IAM security (as of 2026-05-16).

→ Cloud-managed DeepSeek deployments carry a significant markup but provide essential enterprise governance.

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

Self-hosting DeepSeek open weights

developersmbenterprise

The problem: At extremely high scales, even low per-token API costs can accumulate into large monthly bills. You also want total control over model quantization and privacy for sensitive data.

What to do: Self-hosting DeepSeek open weights on private GPU clusters can become more cost-effective than API calls at massive scale.

The API cost for V4 Pro is $0.435/M input and $0.87/M output. If your volume exceeds several billion tokens per month, the cost of a dedicated H100 cluster may break even with the API spend. You must factor in engineering time for deployment and maintenance (as of 2026-05-16).

→ Self-hosting is a scale-play that trades operational complexity for long-term cost caps.

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

Volume discounts & partner programs

Heads up — these are community-sourced and analyst-reported terms. Specific credit amounts, discount percentages, and program thresholds change frequently. Always verify current terms directly with DeepSeek before relying on a specific number. Treat reported figures as ballpark, not contract language.

DeepSeek Enterprise Tier

Threshold: reportedly $1M+ annual spend for negotiated terms

Typical discount (reported): varies by volume and contract

Benefits:

How to engage: Contact DeepSeek sales directly via the official website

Source: deepseek.comvendor_official · cited 2026-05-16

Amazon Bedrock Reserved Capacity

Threshold: varies by commitment term (typically 1-month or 6-month)

Typical discount (reported): approximately 60% to 80% versus on-demand rates

Benefits:

How to engage: Purchase through the AWS Management Console under Bedrock Provisioned Throughput

Source: aws.amazon.comvendor_official · cited 2026-05-16

Azure AI Foundry Provisioned Throughput

Threshold: varies by PTU (Provisioned Throughput Unit) commitment

Typical discount (reported): discounts at scale for significant token volumes

Benefits:

How to engage: Contact Microsoft Azure sales or configure via Azure AI Foundry portal

Source: azure.microsoft.comvendor_official · cited 2026-05-16

Together AI Startup Program

Threshold: varies by company stage and profile

Typical discount (reported): up to $50,000 in platform credits

Benefits:

How to engage: Apply via the Together AI startup application page

Source: together.aivendor_official · cited 2026-02-09

Fireworks for Startups

Threshold: AI-native startups (venture-backed preferred)

Typical discount (reported): varies by program track

Benefits:

How to engage: Register through the Fireworks AI startup landing page

Source: fireworks.aivendor_official · cited 2026-05-16

Multi-cloud availability

Cloud-marketplace terms change frequently. Model availability dates, pricing parity, and regional features can drift week to week. Verify with each cloud's pricing page (AWS Bedrock, Google Vertex, Azure AI Foundry) before architecting around specifics.
CloudModel availabilityPrice vs vendor-directReasons to pick
AWS Bedrock DeepSeek-V3.2, DeepSeek-V3.1, and DeepSeek-R1 (fully managed serverless) Significant markup; DeepSeek-V3.2 is $0.62 per 1M input tokens compared to the reported direct price of approximately $0.14-$0.30
  • VPC-native deployment for high-security environments
  • Integration with Amazon Bedrock Guardrails for automated safety filtering
  • Unified billing with existing AWS infrastructure spend
  • Serverless availability eliminates the need for manual GPU provisioning

vertexaisearch.cloud.google.com ↗
Azure AI Foundry DeepSeek-R1, DeepSeek-V3, and DeepSeek-V4 Pro (via Fireworks on Foundry) Higher than direct; DeepSeek-R1 is $1.35 per 1M input tokens vs $0.55 direct, while DeepSeek-V4 Pro is $1.75 per 1M input tokens
  • Access to 'Fireworks on Foundry' for low-latency inference natively within Azure
  • Provisioned Throughput Units (PTU) portability across different models
  • Enterprise-grade content safety via Azure AI Content Safety classification
  • Contractual data isolation for regulated industries

vertexaisearch.cloud.google.com ↗
Google Vertex AI DeepSeek-R1, DeepSeek-V4 Pro, and DeepSeek-OCR Reportedly comparable to other hyperscalers with a standard cloud markup
  • Native integration with BigQuery for direct AI queries on large datasets
  • Available as both managed APIs and self-deployed models in Model Garden
  • Support for streaming responses to reduce perceived end-user latency
  • VPC-native deployment and contractual data isolation

vertexaisearch.cloud.google.com ↗
Together AI DeepSeek-V4 Pro, DeepSeek-V3.1, and DeepSeek-R1 Mixed; DeepSeek-V4 Pro is $1.75 per 1M input tokens, while DeepSeek-R1 is significantly higher at approximately $7.00-$8.00 per 1M tokens
  • Startup Accelerator program offering up to $50,000 in free credits
  • High performance with a recorded raw time to first token of 0.99s for V4 Pro
  • Support for cached input tokens at a discount (e.g., $0.20 per 1M for V4 Pro)
  • OpenAI-compatible API for easy migration

vertexaisearch.cloud.google.com ↗
Fireworks AI DeepSeek-V4 Pro, DeepSeek-V3, and DeepSeek-R1 Competitive with other inference providers; DeepSeek-V4 Pro is $1.74 per 1M input tokens
  • Proprietary Fire Attention CUDA kernels for high throughput (up to 273 tokens/sec)
  • Up to 40% cost reduction compared to closed-source frontier models
  • Day-zero availability for new DeepSeek flagship releases
  • Optimized for agentic workflows and complex multi-step reasoning

vertexaisearch.cloud.google.com ↗

Free credits & startup programs

Program details and credit amounts shift often. Apply directly through each program's official page for current values, eligibility windows, and application requirements.

DeepSeek Developer Platform Sign-up Grant

Reported value: 5 million free tokens (approximately $8-10 in value)

Eligibility: New API accounts; no credit card required for initial registration

How to apply: Register for an account at platform.deepseek.com

Apply / learn more at nxcode.io ↗

Microsoft for Startups Founders Hub

Reported value: up to $150,000 in Azure credits

Eligibility: Startups meeting Microsoft's criteria; DeepSeek R1 and DeepSeek 3.2 are reportedly eligible for sponsorship credits when billed through Microsoft

How to apply: Apply through the Microsoft for Startups portal

Apply / learn more at learn.microsoft.com ↗

Google for Startups Cloud Program

Reported value: up to $350,000 in Google Cloud credits over 2 years

Eligibility: AI-first startups; DeepSeek R1 is available via Vertex AI Model Garden

How to apply: Apply via the Google for Startups website

Apply / learn more at cloud.google.com ↗

AWS Activate

Reported value: up to $100,000 in AWS credits

Eligibility: Early-stage startups; credits can be used for DeepSeek-R1 via Amazon Bedrock Marketplace

How to apply: Apply through the AWS Activate website

Apply / learn more at aws.amazon.com ↗

Together AI Startup Accelerator

Reported value: up to $50,000 in free credits

Eligibility: AI-native startups building with open-source models; supports DeepSeek V3 and R1

How to apply: Apply via Together AI's startup program page

Apply / learn more at getaiperks.com ↗

Fireworks for Startups

Reported value: approximately $5,000–$10,000 in build credits

Eligibility: AI-native startups; includes access to DeepSeek models

How to apply: Apply through the Fireworks AI website

Apply / learn more at guptadeepak.com ↗

DeepSeek Academic/Researcher Access

Reported value: monthly token allowance (reportedly 1 to 3 million tokens)

Eligibility: Students, researchers, and individual developers for non-commercial use

How to apply: Register with a valid email or GitHub account; apply for academic partnership where available

Apply / learn more at datastudios.org ↗

BytePlus AI Startups Accelerator

Reported value: 500,000 free tokens across premium LLMs

Eligibility: Startups using BytePlus ModelArk; supports DeepSeek-V3.1

How to apply: Apply through the BytePlus Partner Central portal

Apply / learn more at byteplus.com ↗

Pricing gotchas to watch

Most gotchas below were surfaced by community reports. Some may have been fixed, changed, or never been the user-facing issue they appeared. Verify against current vendor docs before architecting around a workaround.

Best-Effort Cache Eviction and TTL Variance

DeepSeek's 'Context Caching on Disk' is enabled by default but operates on a best-effort basis without a 100% hit rate guarantee. While official documentation states unused cache entries are cleared within 'a few hours to a few days', community reports suggest caches can expire in as little as 5 minutes to 1 hour during periods of inactivity or sparse traffic.

Workaround: Maintain a consistent request heart-beat or 'warmup' calls using identical stable prefixes to prevent eviction; monitor the 'prompt_cache_hit_tokens' field to detect unexpected misses.

Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16

JSON Mode 'Infinite Whitespace' Billing Trap

When enabling JSON Output mode ({ 'type': 'json_object' }), users must also explicitly instruct the model to produce JSON via a system or user message. Failure to do so can cause the model to generate an 'unending stream of whitespace' until it hits the max_tokens limit, resulting in significant unexpected costs for empty output tokens.

Workaround: Always include 'respond in JSON' in the system prompt when using 'json_object' format and set a strict 'max_tokens' limit to cap potential runaway generation costs.

Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16

Azure Serverless 4K Token Input Constraint

Users deploying DeepSeek models (like R1) via Azure AI Foundry's serverless API have reportedly encountered a hard 4,000-token limit for input context, despite the model supporting significantly larger windows (up to 128k or 1M tokens) on the official DeepSeek platform.

Workaround: Use the official DeepSeek API or alternative providers like Together.ai or Fireworks.ai if your production use case requires long-context RAG or large file analysis.

Source: news.ycombinator.comreddit · cited 2026-05-16

64-Token Chunking and Prefix Alignment

DeepSeek's automatic caching system processes prefixes in 64-token chunks. Any change to the prompt—even a single character—before a chunk boundary will invalidate the cache for all subsequent tokens. Common 'surprises' include dynamic IDs, timestamps, or 'lorebook' triggers placed early in the prompt which break the prefix match.

Workaround: Structure prompts with 'stable' content (system instructions, tool definitions, static documents) at the very beginning and move dynamic session metadata or user-specific questions to the end.

Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16

Beta Endpoint Requirement for Strict Tooling

Advanced features like 'Strict Mode' for tool calls (which ensures model output strictly adheres to a JSON schema) are currently in Beta and require users to point their client to a specific beta-only base URL (https://api.deepseek.com/beta) rather than the standard production endpoint.

Workaround: Update API client configurations to use the /beta base URL when implementing strict schema enforcement for agentic workflows.

Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16

Reasoner Output Token Decoupling

The DeepSeek-Reasoner (R1) model allows for significantly longer outputs (up to 32K or 64K tokens) than its input limit (64K). Notably, the internal reasoning chains (visible in <think> blocks) contribute to the total token count and billing, which can surprise users expecting costs to scale only with the final answer length.

Workaround: Use the 'max_tokens' parameter to limit the total generation (including reasoning) and monitor 'reasoning_content' in the API response to track the cost of the model's 'thinking' process.

Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16

Hidden costs (25-40% beyond per-token rates)

Typical overhead: 25-40% beyond raw per-token rates.

What it costs to leave DeepSeek

DeepSeek uses an OpenAI-compatible API, which makes the technical migration to other providers relatively simple. However, the primary lock-in is the aggressive pricing, as moving to other frontier models can increase your token costs by 5x to 10x for similar performance levels.

Who is this for?

For vibe coders & solo devs

DeepSeek is the primary choice for developers who prioritize raw cost-to-performance ratios. You can start building immediately with a 5 million token sign-up grant that requires no credit card. The V4 Flash model is ideal for building high-frequency tools or personal assistants without worrying about a large bill. Focus on using the standard API for the lowest possible rates.

* Register for the Developer Platform to claim 5 million free tokens.
* Use V4 Flash for all non-reasoning tasks to keep costs at $0.14 per million input tokens.
* Implement context caching to lower input costs to $0.0028 per million tokens for stable prefixes.
* Use the Together AI or Fireworks startup programs if you need additional build credits.

For SMBs and growing teams

For small and mid-sized businesses, DeepSeek offers a way to compete with larger firms by significantly lowering the cost of RAG and automated customer support. You can leverage startup accelerators from providers like Together AI or Fireworks to get up to $50,000 in credits. This allows for extensive prototyping before committing to a paid tier. Be mindful of the 64-token chunking rule to ensure your prompt caching remains efficient.

* Apply for the Together AI Startup Program for up to $50,000 in platform credits.
* Utilize Azure AI Foundry if your team already relies on Microsoft's security and identity stack.
* Structure prompts with static system instructions first to maximize the 64-token cache alignment.
* Monitor for 'infinite whitespace' billing traps by setting strict max_tokens limits in JSON mode.

For enterprise buyers

Enterprises should view DeepSeek as a high-performance engine that requires a robust cloud wrapper for compliance. While direct API rates are low, the $1M+ annual spend threshold for negotiated enterprise terms suggests most will prefer managed services. Using Amazon Bedrock or Azure AI Foundry provides the necessary SLAs and data isolation for regulated industries. Provisioned throughput (PTU) is recommended for guaranteed performance during peak hours.

* Engage DeepSeek sales directly only if your projected annual spend exceeds $1M for custom terms.
* Use Amazon Bedrock Reserved Capacity to secure 60% to 80% discounts versus on-demand cloud rates.
* Deploy via Google Vertex AI to integrate natively with BigQuery for large-scale data analysis.
* Verify regional data residency controls within Azure AI Foundry to meet specific compliance mandates.
Need help deciding which DeepSeek tier or model fits your workload? Book a $19.99 quick call →

Sources verified for this page

Primary: DeepSeek pricing page

View all 24 cited insider sources across 15 domains

Generator: gen-v4.13-2026-05-15 · Last refreshed: Sat May 16 2026 18:13:05 GMT-0400 (Eastern Daylight Time) · Pricing snapshot: Sat May 16 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-05.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-05
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 578 days captured
Anthropic Docs
2026-06-05
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 578 days captured
OpenAI
2026-06-05
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 579 days captured
Google AI
2026-06-05
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 554 days captured
Google Vertex
2026-06-05
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 554 days captured
DeepSeek
2026-06-05
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 493 days captured
xAI
2026-06-05
https://x.ai/api
Daily snapshot since Nov 2024 · 411 days captured
Mistral
2026-06-05
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 552 days captured
Cohere
2026-06-05
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 578 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →