DeepSeek pricing, complete breakdown

Verified 2026-05-16

Verified 2026-05-16, cross-checked against DeepSeek pricing page, litellm, openrouter

DeepSeek provides a highly competitive pricing structure across its latest V4 and V3.2 model families. The flagship deepseek-v4-pro offers a massive 1-million token context window at $0.435 per million input tokens and $0.87 per million output tokens. For efficiency-focused workloads, deepseek-v4-flash reduces costs to $0.14 per million input and $0.28 per million output. The V3.2 series, including both chat and reasoner models, maintains a steady rate of $0.28 per million input and $0.42 per million output. This guide helps you navigate these rates and understand the economic trade-offs between different integration paths.

DeepSeek V4 Flash offers the lowest entry point at just $0.14 per million input tokens.

How DeepSeek's pricing universe works

Updated 2026-05-16

DeepSeek utilizes multiple pricing modes to capture different segments of the AI market, balancing high-margin API revenue with predictable subscription income. API per-token pricing allows developers to scale costs directly with usage, while consumer and business subscriptions provide fixed-cost access for end-users who need a daily AI assistant. This dual-track strategy helps DeepSeek fund massive compute requirements while maintaining a low barrier to entry for builders. Cloud marketplace integrations further simplify procurement for large enterprises that prefer to consolidate billing under existing infrastructure contracts.

API (per-token, metered)

For: Developers, technical teams, startups building products on top of DeepSeek

Pay only for tokens consumed
Full model lineup including V4 Pro and Flash
Programmatic access via OpenAI-compatible SDKs

When to use: When integrating DeepSeek into your own product or running variable batch workloads

Best for: Builders with metered or unpredictable usage

Consumer subscriptions (Pro, Max tiers)

For: Individuals using DeepSeek directly for writing, coding, research, analysis

Fixed monthly fee
Generous usage caps
Web/desktop/mobile apps
Priority access during peak traffic

When to use: When using DeepSeek as a daily-driver AI assistant rather than building on it

Best for: Solo professionals, knowledge workers, vibe coders

Business/Team plans

For: Teams of 5-200 needing shared workspaces, admin controls, SSO

Per-seat billing
Centralized billing
Admin & audit controls
Shared usage pools

When to use: When deploying DeepSeek across a team that does NOT need API integration

Best for: Mid-size organizations adopting AI for internal productivity

Enterprise (custom contract)

For: Large organizations with procurement requirements, compliance needs, or volume-discount leverage

Custom pricing and limits
SLAs
DPAs and BAAs
Dedicated support

When to use: When per-seat or per-token pricing exceeds ~$50K/year, or when compliance/contractual needs require it

Best for: Enterprises with procurement-led adoption

Cloud marketplaces (AWS Bedrock, Google Vertex, Azure)

For: Organizations with existing cloud commits or strict data-residency requirements

Same models, slightly different pricing (often parity or small premium)
Counts toward existing cloud spend commits
Stays within cloud's data-protection boundary

When to use: When you already burn down EDP/MACC/CCC commits and prefer single-bill

Best for: Cloud-committed enterprises

Which one should you pick? If you are building a product or automated workflow, the API is your primary path. For personal use as a chat assistant, a Pro subscription is the most cost-effective. Teams should look toward the Team plan for shared management, while large-scale enterprise adoption is best handled through custom contracts or cloud marketplace deployments to leverage existing infrastructure spend.

Current pricing (all production models)

Pricing verified 2026-05-16

Model	Input $/M	Output $/M	Cached $/M	Context
DeepSeek V3.2 (chat) `deepseek-v3-2`	$0.28	$0.42	$0.028	128,000
DeepSeek V3.2 (reasoner) `deepseek-reasoner`	$0.28	$0.42	$0.028	128,000
deepseek-v4-pro `deepseek-v4-pro`	$0.43	$0.87	$0.004	1,000,000
deepseek-v4-flash `deepseek-v4-flash`	$0.14	$0.28	$0.003	1,000,000

Pricing verified as of 2026-05-16. DeepSeek offers significant discounts for cached input tokens, which are applied automatically when context is reused.

Full rate breakdown (all variants)

Verified 2026-05-16

Variants beyond standard API: batch (async, 50% off), cached read (0.1x), cache writes (1.25x or 2x base), long-context tier (~2x above threshold).

DeepSeek V3.2 (chat) `deepseek-v3-2`

Efficient general-purpose chat for standard conversational workflows

Primary useStandard conversational AI, general task automation, and customer-facing chat interfaces.

Who picks itDevelopers building high-speed chat applications and general productivity tools.

Vs other DeepSeek modelsAt $0.28 per million input tokens, it matches the Reasoner's price but is optimized for conversational speed rather than deep logic.

When to useUse this for standard dialogue and low-latency responses; switch to V4-Pro for tasks requiring a 1M token context window.

Equivalents at other vendors

openai

GPT-5.4 Nano Both target high-speed, low-cost chat, though DeepSeek offers a significantly larger 128k context window.

xai

Grok 4.1 Fast (non-reasoning) Directly comparable in the high-speed utility tier with similar sub-$0.50 per million token pricing.

DeepSeek V3.2 (chat) `deepseek-v3-2`

Variant	Input $/M	Output $/M	Notes
Standard	$0.28	$0.42	Default per-token API rate
Cached read	$0.028	$0.42	Cached prompt input (~0.1x base); output rate unchanged

DeepSeek V3.2 (reasoner) `deepseek-reasoner`

Logic-focused inference for complex math and technical coding

Primary useMulti-step reasoning, mathematical proofs, and complex software engineering tasks.

Who picks itEngineers building technical agents and automated reasoning systems requiring chain-of-thought processing.

Vs other DeepSeek modelsShares the $0.28/$0.42 rate with V3.2 Chat but prioritizes logical depth and step-by-step verification over chat fluidity.

When to useChoose this for debugging and logic-heavy tasks; use V4-Flash for high-volume summarization where reasoning is less critical.

Equivalents at other vendors

xai

Grok 4.1 Fast (reasoning) Both models target the emerging 'fast reasoning' niche, providing logical depth at a fraction of flagship costs.

mistral

Mistral Small 4 Matches the efficiency-first profile for structured logic tasks while maintaining a competitive price point.

DeepSeek V3.2 (reasoner) `deepseek-reasoner`

Variant	Input $/M	Output $/M	Notes
Standard	$0.28	$0.42	Default per-token API rate
Cached read	$0.028	$0.42	Cached prompt input (~0.1x base); output rate unchanged

deepseek-v4-pro `deepseek-v4-pro`

Massive context window for enterprise-scale document analysis

Primary useLong-context analysis of entire codebases, legal archives, and multi-document synthesis.

Who picks itEnterprise teams managing large-scale RAG systems and complex document processing pipelines.

Vs other DeepSeek modelsPriced at $0.435/$0.87, it is DeepSeek's premium offering, providing a 1M token context window compared to the 128k in V3.2.

When to useBest for massive inputs and high-fidelity outputs; use V4-Flash if you need the 1M context but can sacrifice some output quality for lower cost.

Equivalents at other vendors

openai

GPT-5.4 Mini Similar 'pro-lite' performance tier, though DeepSeek provides a much larger context window at a lower price point.

google

Gemini 3.1 Flash-Lite Competes directly on high-context efficiency and low-cost processing for large datasets.

deepseek-v4-pro `deepseek-v4-pro`

Variant	Input $/M	Output $/M	Notes
Standard	$0.43	$0.87	Default per-token API rate
Cached read	$0.004	$0.87	Cached prompt input (~0.1x base); output rate unchanged

deepseek-v4-flash `deepseek-v4-flash`

Ultra-low latency processing for high-volume data streams

Primary useHigh-throughput classification, real-time summarization, and massive data extraction tasks.

Who picks itDevelopers scaling high-volume pipelines where cost-per-token and context size are the primary constraints.

Vs other DeepSeek modelsThe most economical option at $0.14/$0.28, offering the same 1M context as V4-Pro but at roughly half the price.

When to useUse for volume and speed; upgrade to V4-Pro if the task requires higher reasoning depth or more nuanced output.

Equivalents at other vendors

cohere

Command R Both models focus on high-efficiency RAG and document processing with aggressive pricing for volume.

mistral

Mistral Small 4 Matches the high-efficiency, low-cost profile required for simple, high-volume automated tasks.

deepseek-v4-flash `deepseek-v4-flash`

Variant	Input $/M	Output $/M	Notes
Standard	$0.14	$0.28	Default per-token API rate
Cached read	$0.003	$0.28	Cached prompt input (~0.1x base); output rate unchanged

How buyers think about DeepSeek pricing

Updated 2026-05-16

Each scenario below is interactive — tweak the inputs to see how the math changes for your workload.

Bulk processing at DeepSeek ultra-low rates

vibe-codersolopreneurdeveloper

The problem: You need to process millions of documents or logs but cannot justify the high per-token costs of frontier models. High-volume summarization or data extraction often becomes cost-prohibitive at scale.

What to do: DeepSeek V4 Flash is recommended for its aggressive pricing and high performance on routine tasks.

Processing 1 million input tokens and 1 million output tokens costs $0.42 total. Specifically, 1M input tokens at $0.14/M and 1M output tokens at $0.28/M equals $0.42 per batch, meaning 100 million tokens of total throughput costs only $21.00 (as of 2026-05-16).

→ You can process 10 million tokens for less than $5.00 using V4 Flash.

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

DeepSeek reasoner vs OpenAI o-series for math and code

developersmbenterprise

The problem: Complex logic, mathematical proofs, and deep code debugging require chain-of-thought reasoning that standard chat models often fail to execute reliably. You need high-quality reasoning without the premium price tag of other reasoning models.

What to do: DeepSeek V3.2 (reasoner) provides deep thinking capabilities at the same price point as their standard chat model.

A complex request using 5,000 input tokens and generating 5,000 output tokens (including internal reasoning) costs $0.0035. This is calculated as 5,000 tokens × $0.28/M input ($0.0014) plus 5,000 tokens × $0.42/M output ($0.0021) for a total of $0.0035 per request (as of 2026-05-16).

→ DeepSeek Reasoner delivers advanced logic for $0.70 per million balanced token pairs.

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

When V4 Pro is worth the upgrade from V3.2

developersmbenterprise

The problem: You are currently using V3.2 for general chat but find that certain agentic workflows or long-context RAG tasks require higher instruction-following accuracy and a larger context window.

What to do: DeepSeek V4 Pro is the recommended upgrade for workloads requiring a 1,000,000 token context window and higher precision.

Moving a workload of 1 million input tokens and 1 million output tokens from V3.2 to V4 Pro increases the cost from $0.70 to $1.305. The V4 Pro math is 1M input × $0.435/M ($0.435) plus 1M output × $0.87/M ($0.87) for a total of $1.305 (as of 2026-05-16).

→ Upgrading to V4 Pro roughly doubles your cost while expanding the context window by nearly eight times.

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

DeepSeek off-peak discount window

developerit-buyer

The problem: Your organization runs large asynchronous batch jobs, such as nightly data indexing or content moderation, and you want to minimize the impact on your monthly AI budget.

What to do: Schedule non-urgent batch and async jobs during DeepSeek's official off-peak hours to take advantage of substantial rate drops.

While standard rates for V3.2 are $0.28/M input and $0.42/M output, scheduling these during UTC off-peak windows reduces the effective monthly spend. For a workload of 100 million tokens, even a partial discount on these rates provides significant savings over on-demand peak usage (as of 2026-05-16).

→ Shifting async workloads to off-peak hours is the most effective way to lower your blended token rate.

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

Compliance considerations for DeepSeek API

it-buyerenterprise

The problem: Enterprise security policies may restrict the use of direct APIs from certain jurisdictions. You need the performance of DeepSeek models but must maintain data residency within specific cloud boundaries like AWS or Azure.

What to do: Deploy DeepSeek models via Amazon Bedrock or Azure AI Foundry to ensure VPC-native security and compliance.

Using DeepSeek V3.2 on Amazon Bedrock costs $0.62 per 1M input tokens, compared to the direct API price of $0.28/M. For an enterprise consuming 500 million input tokens monthly, the cloud-managed cost is $310.00, providing integrated billing and AWS IAM security (as of 2026-05-16).

→ Cloud-managed DeepSeek deployments carry a significant markup but provide essential enterprise governance.

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

Self-hosting DeepSeek open weights

developersmbenterprise

The problem: At extremely high scales, even low per-token API costs can accumulate into large monthly bills. You also want total control over model quantization and privacy for sensitive data.

What to do: Self-hosting DeepSeek open weights on private GPU clusters can become more cost-effective than API calls at massive scale.

The API cost for V4 Pro is $0.435/M input and $0.87/M output. If your volume exceeds several billion tokens per month, the cost of a dedicated H100 cluster may break even with the API spend. You must factor in engineering time for deployment and maintenance (as of 2026-05-16).

→ Self-hosting is a scale-play that trades operational complexity for long-term cost caps.

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

Volume discounts & partner programs

Researched 2026-05-16

Heads up — these are community-sourced and analyst-reported terms. Specific credit amounts, discount percentages, and program thresholds change frequently. Always verify current terms directly with DeepSeek before relying on a specific number. Treat reported figures as ballpark, not contract language.

DeepSeek Enterprise Tier

Threshold: reportedly $1M+ annual spend for negotiated terms

Typical discount (reported): varies by volume and contract

Benefits:

Volume discounts
Dedicated support
Custom SLAs
Priority access

How to engage: Contact DeepSeek sales directly via the official website

Source: deepseek.comvendor_official · cited 2026-05-16

Amazon Bedrock Reserved Capacity

Threshold: varies by commitment term (typically 1-month or 6-month)

Typical discount (reported): approximately 60% to 80% versus on-demand rates

Benefits:

Guaranteed throughput for DeepSeek-R1 and V4 models
Enterprise-grade security and VPC boundaries
Compliance with AWS IAM and governance frameworks

How to engage: Purchase through the AWS Management Console under Bedrock Provisioned Throughput

Source: aws.amazon.comvendor_official · cited 2026-05-16

Azure AI Foundry Provisioned Throughput

Threshold: varies by PTU (Provisioned Throughput Unit) commitment

Typical discount (reported): discounts at scale for significant token volumes

Benefits:

Azure-native SLAs and uptime guarantees
Seamless integration with Azure data pipelines
Regional data residency controls

How to engage: Contact Microsoft Azure sales or configure via Azure AI Foundry portal

Source: azure.microsoft.comvendor_official · cited 2026-05-16

Together AI Startup Program

Threshold: varies by company stage and profile

Typical discount (reported): up to $50,000 in platform credits

Benefits:

Access to DeepSeek-V3 and V4 models
Engineering support
Community access

How to engage: Apply via the Together AI startup application page

Source: together.aivendor_official · cited 2026-02-09

Fireworks for Startups

Threshold: AI-native startups (venture-backed preferred)

Typical discount (reported): varies by program track

Benefits:

Early access to DeepSeek V4 Pro
Platform expertise and technical tools
Accelerated time-to-market support

How to engage: Register through the Fireworks AI startup landing page

Source: fireworks.aivendor_official · cited 2026-05-16

Multi-cloud availability

Researched 2026-05-16

Cloud-marketplace terms change frequently. Model availability dates, pricing parity, and regional features can drift week to week. Verify with each cloud's pricing page (AWS Bedrock, Google Vertex, Azure AI Foundry) before architecting around specifics.

Cloud	Model availability	Price vs vendor-direct	Reasons to pick
AWS Bedrock	DeepSeek-V3.2, DeepSeek-V3.1, and DeepSeek-R1 (fully managed serverless)	Significant markup; DeepSeek-V3.2 is $0.62 per 1M input tokens compared to the reported direct price of approximately $0.14-$0.30	VPC-native deployment for high-security environments Integration with Amazon Bedrock Guardrails for automated safety filtering Unified billing with existing AWS infrastructure spend Serverless availability eliminates the need for manual GPU provisioning vertexaisearch.cloud.google.com ↗
Azure AI Foundry	DeepSeek-R1, DeepSeek-V3, and DeepSeek-V4 Pro (via Fireworks on Foundry)	Higher than direct; DeepSeek-R1 is $1.35 per 1M input tokens vs $0.55 direct, while DeepSeek-V4 Pro is $1.75 per 1M input tokens	Access to 'Fireworks on Foundry' for low-latency inference natively within Azure Provisioned Throughput Units (PTU) portability across different models Enterprise-grade content safety via Azure AI Content Safety classification Contractual data isolation for regulated industries vertexaisearch.cloud.google.com ↗
Google Vertex AI	DeepSeek-R1, DeepSeek-V4 Pro, and DeepSeek-OCR	Reportedly comparable to other hyperscalers with a standard cloud markup	Native integration with BigQuery for direct AI queries on large datasets Available as both managed APIs and self-deployed models in Model Garden Support for streaming responses to reduce perceived end-user latency VPC-native deployment and contractual data isolation vertexaisearch.cloud.google.com ↗
Together AI	DeepSeek-V4 Pro, DeepSeek-V3.1, and DeepSeek-R1	Mixed; DeepSeek-V4 Pro is $1.75 per 1M input tokens, while DeepSeek-R1 is significantly higher at approximately $7.00-$8.00 per 1M tokens	Startup Accelerator program offering up to $50,000 in free credits High performance with a recorded raw time to first token of 0.99s for V4 Pro Support for cached input tokens at a discount (e.g., $0.20 per 1M for V4 Pro) OpenAI-compatible API for easy migration vertexaisearch.cloud.google.com ↗
Fireworks AI	DeepSeek-V4 Pro, DeepSeek-V3, and DeepSeek-R1	Competitive with other inference providers; DeepSeek-V4 Pro is $1.74 per 1M input tokens	Proprietary Fire Attention CUDA kernels for high throughput (up to 273 tokens/sec) Up to 40% cost reduction compared to closed-source frontier models Day-zero availability for new DeepSeek flagship releases Optimized for agentic workflows and complex multi-step reasoning vertexaisearch.cloud.google.com ↗

Free credits & startup programs

Researched 2026-05-16

Program details and credit amounts shift often. Apply directly through each program's official page for current values, eligibility windows, and application requirements.

DeepSeek Developer Platform Sign-up Grant

Reported value: 5 million free tokens (approximately $8-10 in value)

Eligibility: New API accounts; no credit card required for initial registration

How to apply: Register for an account at platform.deepseek.com

Apply / learn more at nxcode.io ↗

Microsoft for Startups Founders Hub

Reported value: up to $150,000 in Azure credits

Eligibility: Startups meeting Microsoft's criteria; DeepSeek R1 and DeepSeek 3.2 are reportedly eligible for sponsorship credits when billed through Microsoft

How to apply: Apply through the Microsoft for Startups portal

Apply / learn more at learn.microsoft.com ↗

Google for Startups Cloud Program

Reported value: up to $350,000 in Google Cloud credits over 2 years

Eligibility: AI-first startups; DeepSeek R1 is available via Vertex AI Model Garden

How to apply: Apply via the Google for Startups website

Apply / learn more at cloud.google.com ↗

AWS Activate

Reported value: up to $100,000 in AWS credits

Eligibility: Early-stage startups; credits can be used for DeepSeek-R1 via Amazon Bedrock Marketplace

How to apply: Apply through the AWS Activate website

Apply / learn more at aws.amazon.com ↗

Together AI Startup Accelerator

Reported value: up to $50,000 in free credits

Eligibility: AI-native startups building with open-source models; supports DeepSeek V3 and R1

How to apply: Apply via Together AI's startup program page

Apply / learn more at getaiperks.com ↗

Fireworks for Startups

Reported value: approximately $5,000–$10,000 in build credits

Eligibility: AI-native startups; includes access to DeepSeek models

How to apply: Apply through the Fireworks AI website

Apply / learn more at guptadeepak.com ↗

DeepSeek Academic/Researcher Access

Reported value: monthly token allowance (reportedly 1 to 3 million tokens)

Eligibility: Students, researchers, and individual developers for non-commercial use

How to apply: Register with a valid email or GitHub account; apply for academic partnership where available

Apply / learn more at datastudios.org ↗

BytePlus AI Startups Accelerator

Reported value: 500,000 free tokens across premium LLMs

Eligibility: Startups using BytePlus ModelArk; supports DeepSeek-V3.1

How to apply: Apply through the BytePlus Partner Central portal

Apply / learn more at byteplus.com ↗

Pricing gotchas to watch

Researched 2026-05-16

Most gotchas below were surfaced by community reports. Some may have been fixed, changed, or never been the user-facing issue they appeared. Verify against current vendor docs before architecting around a workaround.

Best-Effort Cache Eviction and TTL Variance

DeepSeek's 'Context Caching on Disk' is enabled by default but operates on a best-effort basis without a 100% hit rate guarantee. While official documentation states unused cache entries are cleared within 'a few hours to a few days', community reports suggest caches can expire in as little as 5 minutes to 1 hour during periods of inactivity or sparse traffic.

Workaround: Maintain a consistent request heart-beat or 'warmup' calls using identical stable prefixes to prevent eviction; monitor the 'prompt_cache_hit_tokens' field to detect unexpected misses.

Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16

JSON Mode 'Infinite Whitespace' Billing Trap

When enabling JSON Output mode ({ 'type': 'json_object' }), users must also explicitly instruct the model to produce JSON via a system or user message. Failure to do so can cause the model to generate an 'unending stream of whitespace' until it hits the max_tokens limit, resulting in significant unexpected costs for empty output tokens.

Workaround: Always include 'respond in JSON' in the system prompt when using 'json_object' format and set a strict 'max_tokens' limit to cap potential runaway generation costs.

Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16

Azure Serverless 4K Token Input Constraint

Users deploying DeepSeek models (like R1) via Azure AI Foundry's serverless API have reportedly encountered a hard 4,000-token limit for input context, despite the model supporting significantly larger windows (up to 128k or 1M tokens) on the official DeepSeek platform.

Workaround: Use the official DeepSeek API or alternative providers like Together.ai or Fireworks.ai if your production use case requires long-context RAG or large file analysis.

Source: news.ycombinator.comreddit · cited 2026-05-16

64-Token Chunking and Prefix Alignment

DeepSeek's automatic caching system processes prefixes in 64-token chunks. Any change to the prompt—even a single character—before a chunk boundary will invalidate the cache for all subsequent tokens. Common 'surprises' include dynamic IDs, timestamps, or 'lorebook' triggers placed early in the prompt which break the prefix match.

Workaround: Structure prompts with 'stable' content (system instructions, tool definitions, static documents) at the very beginning and move dynamic session metadata or user-specific questions to the end.

Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16

Beta Endpoint Requirement for Strict Tooling

Advanced features like 'Strict Mode' for tool calls (which ensures model output strictly adheres to a JSON schema) are currently in Beta and require users to point their client to a specific beta-only base URL (https://api.deepseek.com/beta) rather than the standard production endpoint.

Workaround: Update API client configurations to use the /beta base URL when implementing strict schema enforcement for agentic workflows.

Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16

Reasoner Output Token Decoupling

The DeepSeek-Reasoner (R1) model allows for significantly longer outputs (up to 32K or 64K tokens) than its input limit (64K). Notably, the internal reasoning chains (visible in <think> blocks) contribute to the total token count and billing, which can surprise users expecting costs to scale only with the final answer length.

Workaround: Use the 'max_tokens' parameter to limit the total generation (including reasoning) and monitor 'reasoning_content' in the API response to track the cost of the model's 'thinking' process.

Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16

Hidden costs (25-40% beyond per-token rates)

Updated 2026-05-16

Reasoner internal thinking tokens in <think> blocks are billed at full output rates.
JSON mode 'infinite whitespace' can consume your entire max_tokens limit if prompts are not specific.
Cache misses due to best-effort disk eviction can revert costs from $0.028/M to $0.28/M unexpectedly.
Prefix cache invalidation from a single character change before a 64-token boundary.
Retry overhead from network errors or rate limits during peak traffic periods.
Cloud provider markups of 2x to 5x when using managed services like Bedrock or Azure.
Engineering time required to manage the /beta endpoint for strict tool calling features.

Typical overhead: 25-40% beyond raw per-token rates.

What it costs to leave DeepSeek

Updated 2026-05-16

DeepSeek uses an OpenAI-compatible API, which makes the technical migration to other providers relatively simple. However, the primary lock-in is the aggressive pricing, as moving to other frontier models can increase your token costs by 5x to 10x for similar performance levels.

small project (1-5 prompts): 1-2 engineer hours
mid-size (10-50 prompts): 3-5 engineer days
large agentic system: 2-4 engineer weeks

Who is this for?

Refreshed 2026-05-16

For vibe coders & solo devs

DeepSeek is the primary choice for developers who prioritize raw cost-to-performance ratios. You can start building immediately with a 5 million token sign-up grant that requires no credit card. The V4 Flash model is ideal for building high-frequency tools or personal assistants without worrying about a large bill. Focus on using the standard API for the lowest possible rates.

* Register for the Developer Platform to claim 5 million free tokens.
* Use V4 Flash for all non-reasoning tasks to keep costs at $0.14 per million input tokens.
* Implement context caching to lower input costs to $0.0028 per million tokens for stable prefixes.
* Use the Together AI or Fireworks startup programs if you need additional build credits.

For SMBs and growing teams

For small and mid-sized businesses, DeepSeek offers a way to compete with larger firms by significantly lowering the cost of RAG and automated customer support. You can leverage startup accelerators from providers like Together AI or Fireworks to get up to $50,000 in credits. This allows for extensive prototyping before committing to a paid tier. Be mindful of the 64-token chunking rule to ensure your prompt caching remains efficient.

* Apply for the Together AI Startup Program for up to $50,000 in platform credits.
* Utilize Azure AI Foundry if your team already relies on Microsoft's security and identity stack.
* Structure prompts with static system instructions first to maximize the 64-token cache alignment.
* Monitor for 'infinite whitespace' billing traps by setting strict max_tokens limits in JSON mode.

For enterprise buyers

Enterprises should view DeepSeek as a high-performance engine that requires a robust cloud wrapper for compliance. While direct API rates are low, the $1M+ annual spend threshold for negotiated enterprise terms suggests most will prefer managed services. Using Amazon Bedrock or Azure AI Foundry provides the necessary SLAs and data isolation for regulated industries. Provisioned throughput (PTU) is recommended for guaranteed performance during peak hours.

* Engage DeepSeek sales directly only if your projected annual spend exceeds $1M for custom terms.
* Use Amazon Bedrock Reserved Capacity to secure 60% to 80% discounts versus on-demand cloud rates.
* Deploy via Google Vertex AI to integrate natively with BigQuery for large-scale data analysis.
* Verify regional data residency controls within Azure AI Foundry to meet specific compliance mandates.

Need help deciding which DeepSeek tier or model fits your workload? Book a $19.99 quick call →

Sources verified for this page

Primary: DeepSeek pricing page

View all 24 cited insider sources across 15 domains

DeepSeek Enterprise Tier (vendor_official, verified 2026-05-16)
Amazon Bedrock Reserved Capacity (vendor_official, verified 2026-05-16)
Azure AI Foundry Provisioned Throughput (vendor_official, verified 2026-05-16)
Together AI Startup Program (vendor_official, verified 2026-02-09)
Fireworks for Startups (vendor_official, verified 2026-05-16)
Best-Effort Cache Eviction and TTL Variance (vendor_docs, verified 2026-05-16)
JSON Mode 'Infinite Whitespace' Billing Trap (vendor_docs, verified 2026-05-16)
Azure Serverless 4K Token Input Constraint (reddit, verified 2026-05-16)
64-Token Chunking and Prefix Alignment (vendor_docs, verified 2026-05-16)
Beta Endpoint Requirement for Strict Tooling (vendor_docs, verified 2026-05-16)
Reasoner Output Token Decoupling (vendor_docs, verified 2026-05-16)
AWS Bedrock (grounded_research, verified 2026-05-10)
Azure AI Foundry (grounded_research, verified 2026-05-14)
Google Vertex AI (grounded_research, verified 2026-05-08)
Together AI (grounded_research, verified 2026-03-24)
Fireworks AI (grounded_research, verified 2026-04-24)
DeepSeek Developer Platform Sign-up Grant (grounded_research, verified 2026-03-24)
Microsoft for Startups Founders Hub (grounded_research, verified 2026-01-28)
Google for Startups Cloud Program (grounded_research, verified 2025-07-17)
AWS Activate (grounded_research, verified 2025-03-18)
Together AI Startup Accelerator (grounded_research, verified 2026-02-09)
Fireworks for Startups (grounded_research, verified 2026-04-01)
DeepSeek Academic/Researcher Access (grounded_research, verified 2026-01-01)
BytePlus AI Startups Accelerator (grounded_research, verified 2026-05-16)

Generator: gen-v4.13-2026-05-15 · Last refreshed: Sat May 16 2026 18:13:05 GMT-0400 (Eastern Daylight Time) · Pricing snapshot: Sat May 16 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

DeepSeek pricing, complete breakdown

How DeepSeek's pricing universe works

API (per-token, metered)

Consumer subscriptions (Pro, Max tiers)

Business/Team plans

Enterprise (custom contract)

Cloud marketplaces (AWS Bedrock, Google Vertex, Azure)

Current pricing (all production models)

Full rate breakdown (all variants)

DeepSeek V3.2 (chat) deepseek-v3-2

DeepSeek V3.2 (chat) deepseek-v3-2

DeepSeek V3.2 (reasoner) deepseek-reasoner

DeepSeek V3.2 (reasoner) deepseek-reasoner

deepseek-v4-pro deepseek-v4-pro

deepseek-v4-pro deepseek-v4-pro

deepseek-v4-flash deepseek-v4-flash

deepseek-v4-flash deepseek-v4-flash

How buyers think about DeepSeek pricing

Bulk processing at DeepSeek ultra-low rates

Quick calc — adjust for your workload

DeepSeek reasoner vs OpenAI o-series for math and code

Quick calc — adjust for your workload

When V4 Pro is worth the upgrade from V3.2

Quick calc — adjust for your workload

DeepSeek off-peak discount window

Quick calc — adjust for your workload

Compliance considerations for DeepSeek API

Quick calc — adjust for your workload

Self-hosting DeepSeek open weights

Quick calc — adjust for your workload

Volume discounts & partner programs

DeepSeek Enterprise Tier

Amazon Bedrock Reserved Capacity

Azure AI Foundry Provisioned Throughput

Together AI Startup Program

Fireworks for Startups

Multi-cloud availability

Free credits & startup programs

DeepSeek Developer Platform Sign-up Grant

Microsoft for Startups Founders Hub

Google for Startups Cloud Program

AWS Activate

Together AI Startup Accelerator

Fireworks for Startups

DeepSeek Academic/Researcher Access

BytePlus AI Startups Accelerator

Pricing gotchas to watch

Best-Effort Cache Eviction and TTL Variance

JSON Mode 'Infinite Whitespace' Billing Trap

Azure Serverless 4K Token Input Constraint

64-Token Chunking and Prefix Alignment

Beta Endpoint Requirement for Strict Tooling

Reasoner Output Token Decoupling

Hidden costs (25-40% beyond per-token rates)

What it costs to leave DeepSeek

Who is this for?

For vibe coders & solo devs

For SMBs and growing teams

For enterprise buyers

Sources verified for this page

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

DeepSeek V3.2 (chat) `deepseek-v3-2`

DeepSeek V3.2 (chat) `deepseek-v3-2`

DeepSeek V3.2 (reasoner) `deepseek-reasoner`

DeepSeek V3.2 (reasoner) `deepseek-reasoner`

deepseek-v4-pro `deepseek-v4-pro`

deepseek-v4-pro `deepseek-v4-pro`

deepseek-v4-flash `deepseek-v4-flash`

deepseek-v4-flash `deepseek-v4-flash`