Guides → Playground & Guide → Vector DB Cost - Pinecone vs Weaviate vs Qdrant vs pgvector

Vector DB Cost - Pinecone vs Weaviate vs Qdrant vs pgvector

Meet Vihaan Reddy. Backend Engineer choosing vector storage. "Pinecone is easiest. pgvector is cheapest. What does each actually cost at 5M vectors?"

🔥 Boss wants the cheapest option. Eng team wants Pinecone.

The story

Vector DB pricing has the biggest spread of any RAG component. Same workload (5M vectors, 1M queries/month) costs $1,400/mo on Pinecone Enterprise, $200/mo self-hosted Qdrant on a single VM, $50/mo pgvector on existing Postgres. The 28× spread between Pinecone hosted and pgvector is real - and the 'right' choice depends on operational maturity, not just unit cost.

Vihaan's choice: 5M vectors, ~1M queries/month, expecting 30% growth/year. Pinecone Serverless ($800/mo) is operationally trivial. Self-hosted Qdrant ($200/mo) saves real money but needs SRE attention. pgvector on existing Postgres ($50/mo extra) is cheapest - but locks performance to your Postgres setup and limits scale.

Three real choices, not nine. (1) Hosted SaaS (Pinecone, Weaviate Cloud) for fast-ship + ops-free. (2) Self-hosted dedicated (Qdrant, Milvus, Weaviate self-hosted) for cost-conscious + ops-capable. (3) pgvector for small-scale + Postgres-native. Don't overthink the 9 vendors - the architectural pattern matters more than the brand.

📊 CALCULATOR AT A GLANCE

🚀 Open the full calculator ✉️ Email [email protected]

🎛 Inputs you control

Each input shapes the cost. Click an input on the calculator to set it — explanations below match the live calculator field by field.

▸ Total vectors — Total vectors (chunks/embeddings) stored in the index.

How to choose: Docs times chunks per doc; e.g. 10K docs x 3 chunks = 30K vectors.

▸ Vector dimensions — Embedding dimensionality, which drives storage and RAM.

How to choose: Match your embedding model: 1024 Cohere/Voyage, 1536 OpenAI 3-small, 3072 3-large.

▸ Queries per second (peak) — Peak queries per second the index must serve.

How to choose: Use peak, not average; 5 QPS is about 13M queries per month.

▸ Vector writes per month — New vectors added per month via incremental indexing.

How to choose: Estimate ongoing ingestion; large spikes push managed-tier pricing up.

▸ Deployment preference — Filter results to managed, self-hosted, or all options.

How to choose: Managed means no ops; self-hosted is lowest dollar cost if you have the team.

About this calculator: Vector DB Cost - Pinecone vs Weaviate vs Qdrant vs pgvector

Vector DB pricing varies 10× between hosted SaaS and self-hosted. Storage cost + query cost + ops overhead. Real math for RAG production.

Inputs you control

Input	Impact on result	Range	Typical
Total vectors stored	Vihaan: 5M. Mid-scale RAG: 1-10M. Large knowledge base: 100M+.	10K – 1000M	5000000
Vector dimensions	OpenAI default: 1536. text-embedding-3-large: 3072. Cohere: 1024. Higher dims = more storage cost.	128 – 4.096K	1536
Vector queries per month	Each user search = 1 vector query. Vihaan's 1M/month = ~33K/day. RAG agents do more (3-5 retrievals per turn × turns).	10K – 100M	1000000

Outputs computed for you · model: `vector_hosted`

Output	How inputs affect it
Monthly cost ($)	computed from inputs
Annual cost ($)	monthlyUsd × 12

Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.

What you're looking at

Each input shapes your cost. Move the slider — see the impact.

Total vectors stored 5,000,000

Vihaan: 5M. Mid-scale RAG: 1-10M. Large knowledge base: 100M+.

Estimated: —

Vector dimensions 1,536

OpenAI default: 1536. text-embedding-3-large: 3072. Cohere: 1024. Higher dims = more storage cost.

Estimated: —

Vector queries per month 1,000,000

Each user search = 1 vector query. Vihaan's 1M/month = ~33K/day. RAG agents do more (3-5 retrievals per turn × turns).

Estimated: —

Ready to run the numbers?

Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.

🚀 Open the full calculator →

Reading your result

Pinecone Serverless ≈ $0.80-1.20/M queries + $0.30/GB-month storage. 5M vectors at 1536 dims = ~30GB = $9 storage. 1M queries = $1. Plus baseline = $40 minimum. Total ~$50-100/mo at this scale.

Self-hosted Qdrant ≈ $150-300/mo on a $200 VM. Add maintenance overhead (SRE time). Operationally meaningful at scale.

pgvector ≈ marginal cost on existing Postgres. Adds ~30GB to Postgres + a few CPU cycles per query. Often $20-100/mo extra on existing setup. Best for <10M vectors and simple use cases.

Weaviate Cloud is similar to Pinecone - premium hosted, slightly different pricing model (per-pod vs per-query). Comparable total at this scale.

What "good" looks like:

Hosted SaaS: $50-2K/mo at 5M vectors. Fast ship, ops-free, expensive at scale.
Self-hosted dedicated: $100-400/mo. SRE time required.
pgvector: $20-100/mo. Best for <10M, simple cases.
Switching cost: 2-4 weeks engineering, painful at scale.

Top vector DB providers

Verified 20 hours ago

1

GPT-5 Mini

$0.250 in · $2.00 out ·
2

Command

$1.00 in · $2.00 out ·
3

devstral-2

$0.400 in · $2.00 out ·

Three real scenarios

Same calculator, three different team sizes. Click a tab to see how the numbers shift.

$41.10 / month ≈ $493.18 / year

500K vectors, low query volume. Pinecone Serverless ~$25/mo. Setup: 1 hour. Don't optimize for cost yet - ship the feature.

Healthy range: <$30/mo - Pinecone wins on speed

See inputs used

vectorCount: 500,000
vectorDimensions: 1,536
queriesPerMonth: 200,000
storageProvider: pinecone-serverless

Trade-offs

Cost isn't the only dimension. Click any constraint — see how recommendations change.

What matters most to you? Click any dimension — recommendations update.

Best fit for "cost":

pgvector Cheapest for <10M vectors
Qdrant self-hosted Cheapest for >10M vectors
Pinecone Serverless Cheapest hosted

Cost ranking depends on scale. <10M vectors: pgvector wins. 10M-1B: Qdrant or Milvus self-hosted. Hosted SaaS only justifiable for fast-ship or sub-10M with no Postgres.

Use cases

Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.

$20.00 / month ≈ $240.00 / year

100K vectors. pgvector on your existing Postgres. Marginal cost. Ship in an afternoon.

Healthy range: $15-30/mo on existing Postgres

See inputs used

vectorCount: 100,000
vectorDimensions: 768
queriesPerMonth: 50,000
storageProvider: pgvector

What this calculator can't tell you

Honest limitations — every model is wrong; some are useful. Where this one falls short:

Pinecone Serverless pricing changed in 2024 - current pricing may differ from earlier benchmarks.
Self-hosted infra costs are heuristic - actual depends on cloud provider, region, instance type.
Doesn't model index rebuilding cost (memory, CPU during HNSW construction).
Filter performance varies dramatically - test with your real queries.
Quality (recall) differences are workload-specific - benchmark on your data.

For these, use: Embedding Cost for upstream. RAG Pipeline for downstream.

Where to go next

Cost the embedding side →

Indexing + query embedding costs.

Full RAG architecture cost →

Embedding + vector DB + retrieval + LLM.

Self-host break-even math →

Same logic applies to vector DB self-host vs SaaS.

Methodology

Source: https://docs.pinecone.io/docs/pricing
Extraction: Vendor pricing pages monitored quarterly. Self-hosted infra cost from cloud calculators.
Editorial gate: 8-layer defense — see aicost.ai/ai-cost-economics
Last verified: 6/4/2026, 8:00:00 PM

Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.

3 years of pricing history

Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.

View 3-year history for →

📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

All prices are USD per 1 million tokens, current as of 2026-06-05.
Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
Batch API discounts are 50% off standard rates across providers that offer Batch mode.
Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
Long-context pricing tiers apply when input exceeds model threshold.
Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic

2026-06-05

https://www.anthropic.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Anthropic Docs

2026-06-05

https://platform.claude.com/docs/en/about-claude/pricing

Daily snapshot since Sep 2023 · 578 days captured

OpenAI

2026-06-05

https://openai.com/api/pricing/

Daily snapshot since Sep 2023 · 579 days captured

Google AI

2026-06-05

https://ai.google.dev/gemini-api/docs/pricing

Daily snapshot since Dec 2023 · 554 days captured

Google Vertex

2026-06-05

https://cloud.google.com/vertex-ai/generative-ai/pricing

Daily snapshot since Dec 2023 · 554 days captured

DeepSeek

2026-06-05

https://api-docs.deepseek.com/quick_start/pricing

Daily snapshot since May 2024 · 493 days captured

xAI

2026-06-05

https://x.ai/api

Daily snapshot since Nov 2024 · 411 days captured

Mistral

2026-06-05

https://mistral.ai/pricing

Daily snapshot since Dec 2023 · 552 days captured

Cohere

2026-06-05

https://cohere.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Voyage AI

2026-06-05

https://docs.voyageai.com/docs/pricing

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →

Vector DB Cost - Pinecone vs Weaviate vs Qdrant vs pgvector

The story

🎛 Inputs you control

About this calculator: Vector DB Cost - Pinecone vs Weaviate vs Qdrant vs pgvector

Inputs you control

Outputs computed for you · model: `vector_hosted`

What you're looking at

Ready to run the numbers?

Reading your result

Top vector DB providers

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

The story

🎛 Inputs you control

About this calculator: Vector DB Cost - Pinecone vs Weaviate vs Qdrant vs pgvector

Inputs you control

Outputs computed for you · model: vector_hosted

What you're looking at

Ready to run the numbers?

Reading your result

Top vector DB providers

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

Outputs computed for you · model: `vector_hosted`