DeepSeek pricing, complete breakdown
Verified 2026-05-16, cross-checked against DeepSeek pricing page, litellm, openrouter
DeepSeek provides a highly competitive pricing structure across its latest V4 and V3.2 model families. The flagship deepseek-v4-pro offers a massive 1-million token context window at $0.435 per million input tokens and $0.87 per million output tokens. For efficiency-focused workloads, deepseek-v4-flash reduces costs to $0.14 per million input and $0.28 per million output. The V3.2 series, including both chat and reasoner models, maintains a steady rate of $0.28 per million input and $0.42 per million output. This guide helps you navigate these rates and understand the economic trade-offs between different integration paths.
How DeepSeek's pricing universe works
DeepSeek utilizes multiple pricing modes to capture different segments of the AI market, balancing high-margin API revenue with predictable subscription income. API per-token pricing allows developers to scale costs directly with usage, while consumer and business subscriptions provide fixed-cost access for end-users who need a daily AI assistant. This dual-track strategy helps DeepSeek fund massive compute requirements while maintaining a low barrier to entry for builders. Cloud marketplace integrations further simplify procurement for large enterprises that prefer to consolidate billing under existing infrastructure contracts.
API (per-token, metered)
- Pay only for tokens consumed
- Full model lineup including V4 Pro and Flash
- Programmatic access via OpenAI-compatible SDKs
Consumer subscriptions (Pro, Max tiers)
- Fixed monthly fee
- Generous usage caps
- Web/desktop/mobile apps
- Priority access during peak traffic
Business/Team plans
- Per-seat billing
- Centralized billing
- Admin & audit controls
- Shared usage pools
Enterprise (custom contract)
- Custom pricing and limits
- SLAs
- DPAs and BAAs
- Dedicated support
Cloud marketplaces (AWS Bedrock, Google Vertex, Azure)
- Same models, slightly different pricing (often parity or small premium)
- Counts toward existing cloud spend commits
- Stays within cloud's data-protection boundary
Current pricing (all production models)
| Model | Input $/M | Output $/M | Cached $/M | Context |
|---|---|---|---|---|
DeepSeek V3.2 (chat)deepseek-v3-2 |
$0.28 | $0.42 | $0.028 | 128,000 |
DeepSeek V3.2 (reasoner)deepseek-reasoner |
$0.28 | $0.42 | $0.028 | 128,000 |
deepseek-v4-prodeepseek-v4-pro |
$0.43 | $0.87 | $0.004 | 1,000,000 |
deepseek-v4-flashdeepseek-v4-flash |
$0.14 | $0.28 | $0.003 | 1,000,000 |
Pricing verified as of 2026-05-16. DeepSeek offers significant discounts for cached input tokens, which are applied automatically when context is reused.
Full rate breakdown (all variants)
Variants beyond standard API: batch (async, 50% off), cached read (0.1x), cache writes (1.25x or 2x base), long-context tier (~2x above threshold).
DeepSeek V3.2 (chat) deepseek-v3-2
DeepSeek V3.2 (chat) deepseek-v3-2
| Variant | Input $/M | Output $/M | Notes |
|---|---|---|---|
| Standard | $0.28 | $0.42 | Default per-token API rate |
| Cached read | $0.028 | $0.42 | Cached prompt input (~0.1x base); output rate unchanged |
DeepSeek V3.2 (reasoner) deepseek-reasoner
DeepSeek V3.2 (reasoner) deepseek-reasoner
| Variant | Input $/M | Output $/M | Notes |
|---|---|---|---|
| Standard | $0.28 | $0.42 | Default per-token API rate |
| Cached read | $0.028 | $0.42 | Cached prompt input (~0.1x base); output rate unchanged |
deepseek-v4-pro deepseek-v4-pro
deepseek-v4-pro deepseek-v4-pro
| Variant | Input $/M | Output $/M | Notes |
|---|---|---|---|
| Standard | $0.43 | $0.87 | Default per-token API rate |
| Cached read | $0.004 | $0.87 | Cached prompt input (~0.1x base); output rate unchanged |
deepseek-v4-flash deepseek-v4-flash
deepseek-v4-flash deepseek-v4-flash
| Variant | Input $/M | Output $/M | Notes |
|---|---|---|---|
| Standard | $0.14 | $0.28 | Default per-token API rate |
| Cached read | $0.003 | $0.28 | Cached prompt input (~0.1x base); output rate unchanged |
How buyers think about DeepSeek pricing
Each scenario below is interactive — tweak the inputs to see how the math changes for your workload.
Bulk processing at DeepSeek ultra-low rates
The problem: You need to process millions of documents or logs but cannot justify the high per-token costs of frontier models. High-volume summarization or data extraction often becomes cost-prohibitive at scale.
What to do: DeepSeek V4 Flash is recommended for its aggressive pricing and high performance on routine tasks.
→ You can process 10 million tokens for less than $5.00 using V4 Flash.
DeepSeek reasoner vs OpenAI o-series for math and code
The problem: Complex logic, mathematical proofs, and deep code debugging require chain-of-thought reasoning that standard chat models often fail to execute reliably. You need high-quality reasoning without the premium price tag of other reasoning models.
What to do: DeepSeek V3.2 (reasoner) provides deep thinking capabilities at the same price point as their standard chat model.
→ DeepSeek Reasoner delivers advanced logic for $0.70 per million balanced token pairs.
When V4 Pro is worth the upgrade from V3.2
The problem: You are currently using V3.2 for general chat but find that certain agentic workflows or long-context RAG tasks require higher instruction-following accuracy and a larger context window.
What to do: DeepSeek V4 Pro is the recommended upgrade for workloads requiring a 1,000,000 token context window and higher precision.
→ Upgrading to V4 Pro roughly doubles your cost while expanding the context window by nearly eight times.
DeepSeek off-peak discount window
The problem: Your organization runs large asynchronous batch jobs, such as nightly data indexing or content moderation, and you want to minimize the impact on your monthly AI budget.
What to do: Schedule non-urgent batch and async jobs during DeepSeek's official off-peak hours to take advantage of substantial rate drops.
→ Shifting async workloads to off-peak hours is the most effective way to lower your blended token rate.
Compliance considerations for DeepSeek API
The problem: Enterprise security policies may restrict the use of direct APIs from certain jurisdictions. You need the performance of DeepSeek models but must maintain data residency within specific cloud boundaries like AWS or Azure.
What to do: Deploy DeepSeek models via Amazon Bedrock or Azure AI Foundry to ensure VPC-native security and compliance.
→ Cloud-managed DeepSeek deployments carry a significant markup but provide essential enterprise governance.
Self-hosting DeepSeek open weights
The problem: At extremely high scales, even low per-token API costs can accumulate into large monthly bills. You also want total control over model quantization and privacy for sensitive data.
What to do: Self-hosting DeepSeek open weights on private GPU clusters can become more cost-effective than API calls at massive scale.
→ Self-hosting is a scale-play that trades operational complexity for long-term cost caps.
Volume discounts & partner programs
DeepSeek Enterprise Tier
Threshold: reportedly $1M+ annual spend for negotiated terms
Typical discount (reported): varies by volume and contract
Benefits:
- Volume discounts
- Dedicated support
- Custom SLAs
- Priority access
How to engage: Contact DeepSeek sales directly via the official website
Source: deepseek.comvendor_official · cited 2026-05-16
Amazon Bedrock Reserved Capacity
Threshold: varies by commitment term (typically 1-month or 6-month)
Typical discount (reported): approximately 60% to 80% versus on-demand rates
Benefits:
- Guaranteed throughput for DeepSeek-R1 and V4 models
- Enterprise-grade security and VPC boundaries
- Compliance with AWS IAM and governance frameworks
How to engage: Purchase through the AWS Management Console under Bedrock Provisioned Throughput
Source: aws.amazon.comvendor_official · cited 2026-05-16
Azure AI Foundry Provisioned Throughput
Threshold: varies by PTU (Provisioned Throughput Unit) commitment
Typical discount (reported): discounts at scale for significant token volumes
Benefits:
- Azure-native SLAs and uptime guarantees
- Seamless integration with Azure data pipelines
- Regional data residency controls
How to engage: Contact Microsoft Azure sales or configure via Azure AI Foundry portal
Source: azure.microsoft.comvendor_official · cited 2026-05-16
Together AI Startup Program
Threshold: varies by company stage and profile
Typical discount (reported): up to $50,000 in platform credits
Benefits:
- Access to DeepSeek-V3 and V4 models
- Engineering support
- Community access
How to engage: Apply via the Together AI startup application page
Source: together.aivendor_official · cited 2026-02-09
Fireworks for Startups
Threshold: AI-native startups (venture-backed preferred)
Typical discount (reported): varies by program track
Benefits:
- Early access to DeepSeek V4 Pro
- Platform expertise and technical tools
- Accelerated time-to-market support
How to engage: Register through the Fireworks AI startup landing page
Source: fireworks.aivendor_official · cited 2026-05-16
Multi-cloud availability
| Cloud | Model availability | Price vs vendor-direct | Reasons to pick |
|---|---|---|---|
| AWS Bedrock | DeepSeek-V3.2, DeepSeek-V3.1, and DeepSeek-R1 (fully managed serverless) | Significant markup; DeepSeek-V3.2 is $0.62 per 1M input tokens compared to the reported direct price of approximately $0.14-$0.30 |
vertexaisearch.cloud.google.com ↗ |
| Azure AI Foundry | DeepSeek-R1, DeepSeek-V3, and DeepSeek-V4 Pro (via Fireworks on Foundry) | Higher than direct; DeepSeek-R1 is $1.35 per 1M input tokens vs $0.55 direct, while DeepSeek-V4 Pro is $1.75 per 1M input tokens |
vertexaisearch.cloud.google.com ↗ |
| Google Vertex AI | DeepSeek-R1, DeepSeek-V4 Pro, and DeepSeek-OCR | Reportedly comparable to other hyperscalers with a standard cloud markup |
vertexaisearch.cloud.google.com ↗ |
| Together AI | DeepSeek-V4 Pro, DeepSeek-V3.1, and DeepSeek-R1 | Mixed; DeepSeek-V4 Pro is $1.75 per 1M input tokens, while DeepSeek-R1 is significantly higher at approximately $7.00-$8.00 per 1M tokens |
vertexaisearch.cloud.google.com ↗ |
| Fireworks AI | DeepSeek-V4 Pro, DeepSeek-V3, and DeepSeek-R1 | Competitive with other inference providers; DeepSeek-V4 Pro is $1.74 per 1M input tokens |
vertexaisearch.cloud.google.com ↗ |
Free credits & startup programs
DeepSeek Developer Platform Sign-up Grant
Reported value: 5 million free tokens (approximately $8-10 in value)
Eligibility: New API accounts; no credit card required for initial registration
How to apply: Register for an account at platform.deepseek.com
Microsoft for Startups Founders Hub
Reported value: up to $150,000 in Azure credits
Eligibility: Startups meeting Microsoft's criteria; DeepSeek R1 and DeepSeek 3.2 are reportedly eligible for sponsorship credits when billed through Microsoft
How to apply: Apply through the Microsoft for Startups portal
Google for Startups Cloud Program
Reported value: up to $350,000 in Google Cloud credits over 2 years
Eligibility: AI-first startups; DeepSeek R1 is available via Vertex AI Model Garden
How to apply: Apply via the Google for Startups website
AWS Activate
Reported value: up to $100,000 in AWS credits
Eligibility: Early-stage startups; credits can be used for DeepSeek-R1 via Amazon Bedrock Marketplace
How to apply: Apply through the AWS Activate website
Together AI Startup Accelerator
Reported value: up to $50,000 in free credits
Eligibility: AI-native startups building with open-source models; supports DeepSeek V3 and R1
How to apply: Apply via Together AI's startup program page
Fireworks for Startups
Reported value: approximately $5,000–$10,000 in build credits
Eligibility: AI-native startups; includes access to DeepSeek models
How to apply: Apply through the Fireworks AI website
DeepSeek Academic/Researcher Access
Reported value: monthly token allowance (reportedly 1 to 3 million tokens)
Eligibility: Students, researchers, and individual developers for non-commercial use
How to apply: Register with a valid email or GitHub account; apply for academic partnership where available
BytePlus AI Startups Accelerator
Reported value: 500,000 free tokens across premium LLMs
Eligibility: Startups using BytePlus ModelArk; supports DeepSeek-V3.1
How to apply: Apply through the BytePlus Partner Central portal
Pricing gotchas to watch
Best-Effort Cache Eviction and TTL Variance
DeepSeek's 'Context Caching on Disk' is enabled by default but operates on a best-effort basis without a 100% hit rate guarantee. While official documentation states unused cache entries are cleared within 'a few hours to a few days', community reports suggest caches can expire in as little as 5 minutes to 1 hour during periods of inactivity or sparse traffic.
Workaround: Maintain a consistent request heart-beat or 'warmup' calls using identical stable prefixes to prevent eviction; monitor the 'prompt_cache_hit_tokens' field to detect unexpected misses.
Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16
JSON Mode 'Infinite Whitespace' Billing Trap
When enabling JSON Output mode ({ 'type': 'json_object' }), users must also explicitly instruct the model to produce JSON via a system or user message. Failure to do so can cause the model to generate an 'unending stream of whitespace' until it hits the max_tokens limit, resulting in significant unexpected costs for empty output tokens.
Workaround: Always include 'respond in JSON' in the system prompt when using 'json_object' format and set a strict 'max_tokens' limit to cap potential runaway generation costs.
Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16
Azure Serverless 4K Token Input Constraint
Users deploying DeepSeek models (like R1) via Azure AI Foundry's serverless API have reportedly encountered a hard 4,000-token limit for input context, despite the model supporting significantly larger windows (up to 128k or 1M tokens) on the official DeepSeek platform.
Workaround: Use the official DeepSeek API or alternative providers like Together.ai or Fireworks.ai if your production use case requires long-context RAG or large file analysis.
Source: news.ycombinator.comreddit · cited 2026-05-16
64-Token Chunking and Prefix Alignment
DeepSeek's automatic caching system processes prefixes in 64-token chunks. Any change to the prompt—even a single character—before a chunk boundary will invalidate the cache for all subsequent tokens. Common 'surprises' include dynamic IDs, timestamps, or 'lorebook' triggers placed early in the prompt which break the prefix match.
Workaround: Structure prompts with 'stable' content (system instructions, tool definitions, static documents) at the very beginning and move dynamic session metadata or user-specific questions to the end.
Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16
Beta Endpoint Requirement for Strict Tooling
Advanced features like 'Strict Mode' for tool calls (which ensures model output strictly adheres to a JSON schema) are currently in Beta and require users to point their client to a specific beta-only base URL (https://api.deepseek.com/beta) rather than the standard production endpoint.
Workaround: Update API client configurations to use the /beta base URL when implementing strict schema enforcement for agentic workflows.
Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16
Reasoner Output Token Decoupling
The DeepSeek-Reasoner (R1) model allows for significantly longer outputs (up to 32K or 64K tokens) than its input limit (64K). Notably, the internal reasoning chains (visible in <think> blocks) contribute to the total token count and billing, which can surprise users expecting costs to scale only with the final answer length.
Workaround: Use the 'max_tokens' parameter to limit the total generation (including reasoning) and monitor 'reasoning_content' in the API response to track the cost of the model's 'thinking' process.
Source: api-docs.deepseek.comvendor_docs · cited 2026-05-16
Hidden costs (25-40% beyond per-token rates)
- Reasoner internal thinking tokens in <think> blocks are billed at full output rates.
- JSON mode 'infinite whitespace' can consume your entire max_tokens limit if prompts are not specific.
- Cache misses due to best-effort disk eviction can revert costs from $0.028/M to $0.28/M unexpectedly.
- Prefix cache invalidation from a single character change before a 64-token boundary.
- Retry overhead from network errors or rate limits during peak traffic periods.
- Cloud provider markups of 2x to 5x when using managed services like Bedrock or Azure.
- Engineering time required to manage the /beta endpoint for strict tool calling features.
Typical overhead: 25-40% beyond raw per-token rates.
What it costs to leave DeepSeek
DeepSeek uses an OpenAI-compatible API, which makes the technical migration to other providers relatively simple. However, the primary lock-in is the aggressive pricing, as moving to other frontier models can increase your token costs by 5x to 10x for similar performance levels.
- small project (1-5 prompts): 1-2 engineer hours
- mid-size (10-50 prompts): 3-5 engineer days
- large agentic system: 2-4 engineer weeks
Who is this for?
For vibe coders & solo devs
DeepSeek is the primary choice for developers who prioritize raw cost-to-performance ratios. You can start building immediately with a 5 million token sign-up grant that requires no credit card. The V4 Flash model is ideal for building high-frequency tools or personal assistants without worrying about a large bill. Focus on using the standard API for the lowest possible rates.* Register for the Developer Platform to claim 5 million free tokens.
* Use V4 Flash for all non-reasoning tasks to keep costs at $0.14 per million input tokens.
* Implement context caching to lower input costs to $0.0028 per million tokens for stable prefixes.
* Use the Together AI or Fireworks startup programs if you need additional build credits.
For SMBs and growing teams
For small and mid-sized businesses, DeepSeek offers a way to compete with larger firms by significantly lowering the cost of RAG and automated customer support. You can leverage startup accelerators from providers like Together AI or Fireworks to get up to $50,000 in credits. This allows for extensive prototyping before committing to a paid tier. Be mindful of the 64-token chunking rule to ensure your prompt caching remains efficient.* Apply for the Together AI Startup Program for up to $50,000 in platform credits.
* Utilize Azure AI Foundry if your team already relies on Microsoft's security and identity stack.
* Structure prompts with static system instructions first to maximize the 64-token cache alignment.
* Monitor for 'infinite whitespace' billing traps by setting strict max_tokens limits in JSON mode.
For enterprise buyers
Enterprises should view DeepSeek as a high-performance engine that requires a robust cloud wrapper for compliance. While direct API rates are low, the $1M+ annual spend threshold for negotiated enterprise terms suggests most will prefer managed services. Using Amazon Bedrock or Azure AI Foundry provides the necessary SLAs and data isolation for regulated industries. Provisioned throughput (PTU) is recommended for guaranteed performance during peak hours.* Engage DeepSeek sales directly only if your projected annual spend exceeds $1M for custom terms.
* Use Amazon Bedrock Reserved Capacity to secure 60% to 80% discounts versus on-demand cloud rates.
* Deploy via Google Vertex AI to integrate natively with BigQuery for large-scale data analysis.
* Verify regional data residency controls within Azure AI Foundry to meet specific compliance mandates.
Sources verified for this page
Primary: DeepSeek pricing page
View all 24 cited insider sources across 15 domains
- DeepSeek Enterprise Tier (vendor_official, verified 2026-05-16)
- Amazon Bedrock Reserved Capacity (vendor_official, verified 2026-05-16)
- Azure AI Foundry Provisioned Throughput (vendor_official, verified 2026-05-16)
- Together AI Startup Program (vendor_official, verified 2026-02-09)
- Fireworks for Startups (vendor_official, verified 2026-05-16)
- Best-Effort Cache Eviction and TTL Variance (vendor_docs, verified 2026-05-16)
- JSON Mode 'Infinite Whitespace' Billing Trap (vendor_docs, verified 2026-05-16)
- Azure Serverless 4K Token Input Constraint (reddit, verified 2026-05-16)
- 64-Token Chunking and Prefix Alignment (vendor_docs, verified 2026-05-16)
- Beta Endpoint Requirement for Strict Tooling (vendor_docs, verified 2026-05-16)
- Reasoner Output Token Decoupling (vendor_docs, verified 2026-05-16)
- AWS Bedrock (grounded_research, verified 2026-05-10)
- Azure AI Foundry (grounded_research, verified 2026-05-14)
- Google Vertex AI (grounded_research, verified 2026-05-08)
- Together AI (grounded_research, verified 2026-03-24)
- Fireworks AI (grounded_research, verified 2026-04-24)
- DeepSeek Developer Platform Sign-up Grant (grounded_research, verified 2026-03-24)
- Microsoft for Startups Founders Hub (grounded_research, verified 2026-01-28)
- Google for Startups Cloud Program (grounded_research, verified 2025-07-17)
- AWS Activate (grounded_research, verified 2025-03-18)
- Together AI Startup Accelerator (grounded_research, verified 2026-02-09)
- Fireworks for Startups (grounded_research, verified 2026-04-01)
- DeepSeek Academic/Researcher Access (grounded_research, verified 2026-01-01)
- BytePlus AI Startups Accelerator (grounded_research, verified 2026-05-16)
Generator: gen-v4.13-2026-05-15 · Last refreshed: Sat May 16 2026 18:13:05 GMT-0400 (Eastern Daylight Time) · Pricing snapshot: Sat May 16 2026 00:00:00 GMT-0400 (Eastern Daylight Time)