Guides → Playground & Guide → Vector DB Cost - Pinecone vs Weaviate vs Qdrant vs pgvector
Meet Vihaan Reddy. Backend Engineer choosing vector storage. "Pinecone is easiest. pgvector is cheapest. What does each actually cost at 5M vectors?"
🔥 Boss wants the cheapest option. Eng team wants Pinecone.
Vector DB pricing has the biggest spread of any RAG component. Same workload (5M vectors, 1M queries/month) costs $1,400/mo on Pinecone Enterprise, $200/mo self-hosted Qdrant on a single VM, $50/mo pgvector on existing Postgres. The 28× spread between Pinecone hosted and pgvector is real - and the 'right' choice depends on operational maturity, not just unit cost.
Vihaan's choice: 5M vectors, ~1M queries/month, expecting 30% growth/year. Pinecone Serverless ($800/mo) is operationally trivial. Self-hosted Qdrant ($200/mo) saves real money but needs SRE attention. pgvector on existing Postgres ($50/mo extra) is cheapest - but locks performance to your Postgres setup and limits scale.
Three real choices, not nine. (1) Hosted SaaS (Pinecone, Weaviate Cloud) for fast-ship + ops-free. (2) Self-hosted dedicated (Qdrant, Milvus, Weaviate self-hosted) for cost-conscious + ops-capable. (3) pgvector for small-scale + Postgres-native. Don't overthink the 9 vendors - the architectural pattern matters more than the brand.
Each input shapes the cost. Click an input on the calculator to set it — explanations below match the live calculator field by field.
Vector DB pricing varies 10× between hosted SaaS and self-hosted. Storage cost + query cost + ops overhead. Real math for RAG production.
vector_hosted
Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →Pinecone Serverless ≈ $0.80-1.20/M queries + $0.30/GB-month storage. 5M vectors at 1536 dims = ~30GB = $9 storage. 1M queries = $1. Plus baseline = $40 minimum. Total ~$50-100/mo at this scale.
Self-hosted Qdrant ≈ $150-300/mo on a $200 VM. Add maintenance overhead (SRE time). Operationally meaningful at scale.
pgvector ≈ marginal cost on existing Postgres. Adds ~30GB to Postgres + a few CPU cycles per query. Often $20-100/mo extra on existing setup. Best for <10M vectors and simple use cases.
Weaviate Cloud is similar to Pinecone - premium hosted, slightly different pricing model (per-pod vs per-query). Comparable total at this scale.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
500K vectors, low query volume. Pinecone Serverless ~$25/mo. Setup: 1 hour. Don't optimize for cost yet - ship the feature.
Healthy range: <$30/mo - Pinecone wins on speed
Vihaan's 5M vectors. Qdrant on a $200 VM saves ~$500/mo over Pinecone. Setup: 1-2 days. Maintenance: ~2 hr/month at this scale. ROI: $5-6K/year.
Healthy range: $200-400/mo self-hosted vs $700+ hosted
100M vectors, 10M queries. Pinecone bill would be $20K+/mo. Qdrant cluster on 4 VMs (~$3K/mo infra). Saves $200K+/year. Mandatory self-host at this scale.
Healthy range: $2-5K/mo self-hosted (vs $20K+ hosted)
Cost isn't the only dimension. Click any constraint — see how recommendations change.
Cost ranking depends on scale. <10M vectors: pgvector wins. 10M-1B: Qdrant or Milvus self-hosted. Hosted SaaS only justifiable for fast-ship or sub-10M with no Postgres.
Vector DBs differ in retrieval quality (recall@k). Pinecone, Weaviate, Qdrant are all competitive. pgvector with HNSW is also strong. Test recall on your actual queries.
Hosted vector DBs vary in compliance certifications. For HIPAA, FedRAMP - verify per-provider. Self-host is the safe default for highly regulated workloads.
Vector DB content is reversible to ~30-50% of source text. Privacy posture matters. Self-host or hosted with strong enterprise tier - not consumer/free tier.
p50 is similar across vendors. p99 differs significantly - hosted SaaS often has noisy-neighbor issues, self-hosted is more predictable. Matters for latency-critical UX.
Vectors themselves migrate easily (just data). Application code is the lock-in - Pinecone's metadata filter syntax differs from Qdrant's differs from pgvector's. Multi-vendor abstraction is the hedge.
Self-host saves money but spends SRE time. At <$500/mo savings, self-host probably isn't worth it. At $5K+/mo savings, it almost always is.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
100K vectors. pgvector on your existing Postgres. Marginal cost. Ship in an afternoon.
Healthy range: $15-30/mo on existing Postgres
10M products, 30M queries/month (high search volume). Self-hosted Qdrant cluster justified. Hosted would be $5K+/mo.
Healthy range: $1-2K/mo self-hosted
50M code embeddings, very high query volume (every keystroke is a query in some implementations). Self-host mandatory. Optimize for low-latency reads.
Healthy range: $3-6K/mo self-hosted
30M papers, low query volume (academic tools have small user bases). Hosted Weaviate ~$1.5K/mo. Self-host saves marginal $ given the low query rate. Stay hosted.
Healthy range: $1-2K/mo hosted (low query volume justifies)
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Embedding Cost for upstream. RAG Pipeline for downstream.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →