Google pricing, complete breakdown

Verified 2026-05-26, cross-checked against Google pricing page, litellm, openrouter

Google's model lineup spans from the flagship Gemini 3.1 Pro at $2.00 per million input tokens to the highly efficient Gemini 2.5 Flash-Lite at just $0.10 per million. Developers can leverage the high-speed Gemini 3 Flash for $0.50/M input or the ultra-low-cost Gemini 3.1 Flash-Lite at $0.25/M input. These models support massive context windows up to 2 million tokens, with significant discounts for cached inputs. This guide helps you navigate the trade-offs between per-token API costs and fixed-rate consumer subscriptions.

Gemini 2.5 Flash-Lite offers the lowest entry point at just $0.10 per million input tokens.

How Google's pricing universe works

Google manages a dual-track ecosystem to capture both high-volume developer traffic and steady consumer recurring revenue. The API track is designed for builders who need granular, metered control over token usage and programmatic access to frontier models. Conversely, the subscription track targets end-users who value predictable monthly costs and integrated features like Google Flow credits and Gmail AI tools. This multi-modal approach allows Google to monetize the same underlying models through different margins and customer acquisition costs depending on the user's technical needs.

API (per-token, metered)

For: Developers, technical teams, startups building products on top of Gemini
  • Pay only for tokens consumed
  • Full model lineup including batch, caching, long context
  • Programmatic via SDKs
When to use: When integrating Google into your own product or running variable batch workloads
Best for: Builders with metered or unpredictable usage

Consumer subscriptions (Plus, Pro, Ultra tiers)

For: Individuals using Google directly for writing, coding, research, analysis
  • Fixed monthly fee
  • Generous usage caps
  • Web/desktop/mobile apps
  • Often includes newer models first
When to use: When using Google as a daily-driver AI assistant rather than building on it
Best for: Solo professionals, knowledge workers, vibe coders

Business/Team plans

For: Teams of 5-200 needing shared workspaces, admin controls, SSO
  • Per-seat billing
  • Centralized billing
  • Admin & audit controls
  • Sometimes shared usage pools
When to use: When deploying Google across a team that does NOT need API integration
Best for: Mid-size organizations adopting AI for internal productivity

Enterprise (custom contract)

For: Large organizations with procurement requirements, compliance needs, or volume-discount leverage
  • Custom pricing and limits
  • SLAs
  • DPAs and BAAs
  • Dedicated support
  • Sometimes private cloud / VPC
When to use: When per-seat or per-token pricing exceeds ~$50K/year, or when compliance/contractual needs require it
Best for: Enterprises with procurement-led adoption

Cloud marketplaces (Google Vertex AI)

For: Organizations with existing cloud commits or strict data-residency requirements
  • Same models, slightly different pricing (often parity or small premium)
  • Counts toward existing cloud spend commits
  • Stays within cloud's data-protection boundary
When to use: When you already burn down Google Cloud commits and prefer single-bill
Best for: Cloud-committed enterprises
Which one should you pick? If you are building a custom application, use the API for metered billing. For personal research and daily assistance, choose Google AI Pro ($19.99/mo) or the high-usage Ultra 20x ($199.99/mo) for maximum Gemini Pro access. Teams should look toward Enterprise contracts or Vertex AI on Google Cloud for consolidated billing and compliance.

🎁 Current promos and time-sensitive deals

What's active right now. Auto-hides expired items.
Vertex AI Provisioned Throughput (PT)
Fixed-cost subscription for guaranteed throughput and predictable latency; break-even typically 12-15% of capacity sustained.
expires expires_note · source
Google Cloud Enterprise Discount Program (EDP)
35-40% initial offer for custom pricing; Vertex AI consumption counts toward Customer Annual Spend Commitment (CASC).
expires expires_note · source
Google Cloud offers stackable discounts where Committed Use Discounts (CUDs) can be applied alongside broader Enterprise Discount Programs (EDP).

📅 What changed in the last 30 days

Populated from aicost_price_changelog. Hides automatically when no recent events.
· gemini-2-5-flash-lite-preview-09-2025 added to catalog
New model added: Gemini 2.5 Flash-Lite Preview — inserted by aicost-merge-new-models
· google-gemini-3-1-flash-tts-preview added to catalog
New model added: Gemini 3.1 Flash TTS Preview — inserted by aicost-merge-new-models
· google-gemini-3-1-flash-live-preview added to catalog
New model added: Gemini 3.1 Flash Live Preview — inserted by aicost-merge-new-models
· gemini-3-1-flash-lite-preview added to catalog
New model added: Gemini 3.1 Flash-Lite Preview — inserted by aicost-merge-new-models
· gemini-3-flash-preview added to catalog
New model added: Gemini 3 Flash Preview — inserted by aicost-merge-new-models
· gemini-3-5-flash added to catalog
New model added: Gemini 3.5 Flash — inserted by aicost-merge-new-models
· gemini-3-1-pro-preview added to catalog
New model added: Gemini 3.1 Pro Preview — inserted by aicost-merge-new-models
· gemini-3-1-pro-preview-customtools added to catalog
New model added: Gemini 3.1 Pro Preview (CustomTools) — inserted by aicost-merge-new-models
· gemini-3-1-flash-live-preview output price updated
12.000000 → 4.500000
· gemini-3-1-flash-live-preview input price updated
3.000000 → 0.750000
· gemini-2-5-flash-preview-tts input price updated
0.500000 → 1.000000
· gemini-2-5-flash-preview-tts output price updated
10.000000 → 20.000000
· gemini-2-5-flash-preview-tts batch input price updated
0.250000 → 0.500000
· gemini-2-5-flash-preview-tts batch output price updated
5.000000 → 10.000000
· gemini-3-1-flash-live-preview input price updated
3.000000 → 0.750000
· gemini-3-1-flash-live-preview output price updated
12.000000 → 4.500000
· gemini-3-1-flash-live-preview input price updated
3.000000 → 0.750000
· gemini-3-1-flash-live-preview output price updated
12.000000 → 4.500000
· gemini-3-1-flash-lite · longContextInput changed
Now 0.500000
· gemini-2-5-flash-native-audio-preview-12-2025 input price updated
3.000000 → 0.500000
· gemini-2-5-flash-native-audio-preview-12-2025 output price updated
12.000000 → 2.000000
· gemini-3-1-flash-live-preview input price updated
3.000000 → 0.750000
· gemini-2-5-pro-preview-tts · costPer1MChars changed
20.000000 → 10.000000
· gemini-2-5-flash-lite · longContextInput changed
Now 0.200000
· gemini-2-5-flash-lite · longContextThreshold changed
Now 200000.000000
· veo-3-1-lite-generate-preview · costPerSecond changed
0.100000 → 0.050000
· veo-3-1-fast-generate-preview · costPerSecond changed
0.250000 → 0.100000
· gemini-2-5-flash-lite-preview-09-2025 · longContextInput changed
Now 0.200000
· gemini-2-5-flash-lite-preview-09-2025 · longContextThreshold changed
Now 200000.000000
· gemini-3-1-flash-live-preview output price updated
12.000000 → 4.500000

⏳ Major upcoming changes

Known pricing model changes within the next 90 days
Google Cloud Partner Network 2026 Tier Transition
What changes Full implementation of the three-tier system (Select, Premier, Diamond) with updated ACV thresholds ($250k to $20M).
What stays Existing reseller margins for Select tier partners remain at 8-12%.
Who's affected Google Cloud Partners and resellers.
Action required Partners must verify certification counts (up to 200 for Diamond) to maintain tier status.

Every Google product, profiled

For each product, what it's for, who picks it, what to watch out for, pros and cons, and what we tell our consulting clients.

consumer entry paid

Google AI Plus

$7.99/mo · $95.88/yr
Monthly billing only; includes basic Workspace integration
Target users
students, casual users, solo professionals
Typical uses
  • Drafting emails in Gmail
  • Summarizing long Google Docs
  • Basic creative writing and brainstorming
  • Simple data organization in Sheets
Why pick it
Designed for users who want AI integrated into their daily Google Workspace apps without the high cost of flagship models.
Key features
  • Gemini integration in Gmail and Docs
  • Access to Gemini Flash models
  • Standard response speeds
  • Basic image generation capabilities
⚠ Marketing gimmicks to watch
Model Gating
This tier often excludes the highest-performing 'Ultra' or 'Pro' models, limiting users to faster but less capable 'Flash' versions.
Impact: Expect lower reasoning quality for complex coding or logic tasks compared to the Pro tier.
Workspace Lock-in
The value is heavily tied to using Google's ecosystem; if you use Outlook or Word, the core features are inaccessible.
Impact: Evaluate if you actually use Google Docs/Gmail enough to justify the monthly fee over free alternatives.
Pros
  • Affordable entry point for integrated AI
  • Seamless workflow within Google Workspace
  • No complex setup required
Cons
  • No annual discount available
  • Limited to lower-tier models
  • No extra cloud storage included
Insider view
This is essentially a 'convenience tax' for people who live in Google Docs. If you don't need the sidebar integration, the free version of Gemini often provides similar model performance.
Max bang for buck
Use this if you spend more than 2 hours a day drafting emails or reports in Google Workspace; the time saved on formatting and drafting pays for the $8 quickly.
🔒 Training-on-your-data policy
Consumer data in Workspace may be used to improve services unless specifically opted out in settings. Refer to Google's Privacy Policy.
🔄 Migration path
Upgrade when:
You need higher reasoning capabilities for complex tasks or require more Google One storage.
Downgrade when:
You find yourself using the standalone Gemini web interface more than the Workspace sidebar.
Switch vendor when:
You migrate your primary workflow to Microsoft 365 or require the coding-specific features of Claude.
ScenarioMonthlyAnnualNotes
Standard individual use $7.99 $95.88 Calculated as 12 months at the standard rate.
consumer flagship

Google AI Pro

$19.99/mo · $199.99/yr
Includes 2TB Google One storage bundle
Target users
power users, creative professionals, developers
Typical uses
  • Advanced coding and debugging
  • Complex multi-step reasoning tasks
  • High-resolution AI image generation
  • Analyzing large datasets in Sheets
Why pick it
The best balance of model power and ecosystem benefits, including significant cloud storage and flagship model access.
Key features
  • Access to Gemini 3 Pro and Ultra models
  • 2TB Google One storage included
  • Priority access during peak times
  • Advanced Gemini in Workspace features
  • 1M+ token context window support
⚠ Marketing gimmicks to watch
The Storage Bundle Trap
Google bundles 2TB of storage to justify the $20 price point, making it harder to compare directly with ChatGPT Plus.
Impact: If you already pay for storage, this is a deal; if you don't need storage, you're overpaying for the AI component.
Sticker Price vs. Monthly
The $199.99 annual price is marketed as a discount, but requires a full year commitment upfront.
Impact: Saves approximately $40 per year compared to monthly billing, but reduces flexibility to switch vendors.
Pros
  • Best-in-class context window for long documents
  • Excellent value if you already need cloud storage
  • Fastest response times for flagship models
Cons
  • Expensive if you don't use the 2TB storage
  • Workspace integration can sometimes be buggy
  • Rate limits still exist despite 'Pro' branding
Insider view
This is Google's direct answer to ChatGPT Plus. It wins on context window size (handling massive PDFs) but sometimes lags in pure creative writing nuance compared to competitors.
Max bang for buck
Buy the annual plan if you are committed to the Google ecosystem; the $16.67 effective monthly rate is one of the cheapest flagship AI subscriptions available.
🔒 Training-on-your-data policy
Data from Google One AI Premium subscribers is generally not used to train models for other users, but check the specific 'Gemini Advanced' privacy toggle.
🔄 Migration path
Upgrade when:
You are a professional developer or researcher hitting daily usage caps on the Pro model.
Downgrade when:
You realize you aren't using the 2TB storage and only use AI for basic tasks.
Switch vendor when:
You need the specific artifact-handling capabilities of Claude or the custom GPT ecosystem of OpenAI.
ScenarioMonthlyAnnualNotes
Annual commitment $16.67 $199.99 Paid as a single upfront payment.
Month-to-month flexibility $19.99 $239.88 Total cost if paid monthly for a full year.
consumer power low

Google AI Ultra 5x

$99.99/mo · $1199.8799999999999/yr
High-usage tier for individuals and small teams
Target users
heavy researchers, independent developers, content agencies
Typical uses
  • Continuous coding assistance
  • Batch processing of large document sets
  • High-frequency image/video generation
  • Extensive RAG-based research
Why pick it
Designed for users who find the standard Pro limits restrictive and need 5x more capacity for flagship models.
Key features
  • 5x higher rate limits for Gemini Ultra
  • Priority technical support
  • Early access to experimental 2M+ context windows
  • All features of the Pro tier included
⚠ Marketing gimmicks to watch
Soft Limits
Even at 5x, limits are often 'soft' and can result in throttling rather than hard cutoffs.
Impact: Performance may degrade during peak hours even if you haven't hit your mathematical limit.
The 'Ultra' Naming Confusion
Google uses 'Ultra' for both the model name and the tier name, making it unclear if you're paying for a better model or just more access.
Impact: You are primarily paying for *volume*, not a fundamentally different model than what's in the Pro tier.
Pros
  • Significantly reduces 'rate limit' interruptions
  • Ideal for power users who 'live' in the chat interface
  • Fastest access to new model iterations
Cons
  • Very high price jump from Pro tier
  • No annual discount offered
  • Lacks true enterprise-grade team management
Insider view
This tier is for the top 1% of users. If you aren't hitting limits on Pro at least twice a week, this is a waste of money. It's a bridge for those who aren't ready for a full Vertex AI enterprise contract.
Max bang for buck
Only subscribe during months of heavy project work; since there is no annual discount, you lose nothing by toggling this on and off.
🔒 Training-on-your-data policy
Standard consumer power-user terms apply; data is generally protected but refer to the specific Gemini Ultra privacy addendum.
🔄 Migration path
Upgrade when:
Your workflow is consistently interrupted by 'try again later' messages on the Pro tier.
Downgrade when:
Your project ends and your usage drops back to standard levels.
Switch vendor when:
You need to manage multiple seats and require centralized billing (move to Vertex AI or ChatGPT Team).
ScenarioMonthlyAnnualNotes
Year-round heavy usage $99.99 $1,199.88 No annual discount available for this tier.
consumer power high

Google AI Ultra 20x

$199.99/mo · $2399.88/yr
Maximum capacity consumer tier
Target users
enterprise power users, high volume creators, AI first startups
Typical uses
  • Near-constant AI interaction
  • Large-scale content production pipelines
  • Extensive testing of long-context prompts
  • Replacing multiple specialized AI tools
Why pick it
The ultimate consumer-facing tier for those who require near-unlimited access to Google's most powerful models without moving to the API.
Key features
  • 20x higher rate limits than Pro
  • Highest priority in the model queue
  • Direct feedback channel to product teams
  • Full Workspace integration with max capacity
⚠ Marketing gimmicks to watch
Diminishing Returns
Most human users cannot physically prompt fast enough to utilize a 20x limit compared to a 5x limit.
Impact: You may be paying for capacity you literally cannot use unless you are automating the UI (which may violate TOS).
Feature Parity
Despite the 10x price increase over Pro, the feature set is nearly identical; you are paying almost exclusively for the quota.
Impact: Check your usage logs before committing to this high monthly spend.
Pros
  • Virtually eliminates rate-limiting for human users
  • Best possible latency during global traffic spikes
  • Access to the absolute cutting edge of Google AI
Cons
  • Extremely expensive for a consumer account
  • No team/seat management features
  • No annual billing option
Insider view
This is a 'whale' tier. It's designed for people whose time is worth so much that a 30-second rate limit lockout costs more than the $200 subscription. For everyone else, the API is more cost-effective.
Max bang for buck
If you are using this much, you should likely be using the API with Context Caching to save 90% on repetitive tasks.
🔒 Training-on-your-data policy
Refer to Google's high-tier consumer privacy policy; generally more restrictive on data usage than entry tiers.
🔄 Migration path
Upgrade when:
You are a 'super-user' whose business relies on the Gemini UI and you hit 5x limits daily.
Downgrade when:
You realize the API is a more efficient way to handle your high volume.
Switch vendor when:
You need enterprise-grade security, SSO, and administrative controls (move to Vertex AI).
ScenarioMonthlyAnnualNotes
Maximum consumer spend $199.99 $2,399.88 The highest possible cost for a non-enterprise Google AI account.
developer api

Gemini API / Vertex AI

Per-token (see API rates above)
Pay-as-you-go based on token usage; context caching and batch discounts available
Target users
developers, enterprise architects, data scientists
Typical uses
  • Building custom AI applications
  • Large-scale data classification
  • Automated customer support bots
  • Long-context document analysis (up to 2M tokens)
Why pick it
Provides the most granular control over model behavior, costs, and data privacy for developers and businesses.
Key features
  • Access to Gemini 3.1 Pro and Flash models
  • Context Caching (90% discount on input tokens)
  • Batch API (50% discount for non-real-time tasks)
  • Provisioned Throughput for guaranteed capacity
  • Vertex AI enterprise security and IAM
⚠ Marketing gimmicks to watch
Explicit Cache Deletion Trap
Users are billed for the full TTL (Time to Live) of a cache even if it is deleted early.
Impact: To save money, shorten the TTL rather than deleting the cache manually if you want to stop billing early.
The 'Free Tier' Rate Limits
The free API tier often uses your data for training and has extremely low rate limits.
Impact: Always use the 'Pay-as-you-go' tier for production to ensure data privacy and reliable uptime.
Pros
  • Most cost-effective way to handle massive contexts
  • Enterprise-grade data privacy (Vertex AI)
  • Highly flexible pricing with Batch and Caching
Cons
  • Requires technical expertise to implement
  • Billing can be unpredictable without caps
  • Provisioned Throughput requires significant commitments
Insider view
Google's API is currently the 'value leader' for long-context tasks. Their Context Caching is a game-changer for RAG applications, making it 10x cheaper than competitors for repetitive large-scale queries.
Max bang for buck
Use the Batch API for any task that doesn't need an immediate response to save 50% instantly. Enable Context Caching for any prompt that reuses the same 32k+ tokens.
🔒 Training-on-your-data policy
By default, data processed via Vertex AI or the paid Gemini API tier is NOT used to train Google's foundation models. Policy: https://cloud.google.com/vertex-ai/docs/generative-ai/data-governance
🔄 Migration path
Upgrade when:
You need guaranteed latency and throughput for a high-traffic production app (move to Provisioned Throughput).
Downgrade when:
Your usage is so low that the $20/mo Pro subscription's 'unlimited' chat is cheaper than your token bill.
Switch vendor when:
You require specific model behaviors found only in Claude (coding) or GPT-4o (multimodal latency).
ScenarioMonthlyAnnualNotes
Small startup (10M Pro tokens/mo) $140 $1,680 Assumes 10M input ($20) and 10M output ($120) on Gemini 3.1 Pro.
High-volume Flash user (100M tokens/mo) $350 $4,200 Assumes 100M input ($50) and 100M output ($300) on Gemini 3 Flash.

All Google products at a glance

Scroll up to the product profile for full detail

ProductPriceBest forHeadline featureYearly estimate
Google AI Plus $10/mo Basic Workspace AI Docs/Gmail Integration $120
Google AI Pro $20/mo Individual Flagship Use 2TB Google One Storage $240
Google AI Ultra 5x $50/mo High-volume individuals 5x Usage Limits $600
Google AI Ultra 20x $150/mo Extreme usage/Small teams 20x Usage Limits $1,800
Gemini API Usage-based App Development 2M+ Context Window Variable

Google vs the field

Same-tier comparison across top 5 vendors

Comparison tierAnthropicOpenAIGooglexAIVerdict
Consumer Flagship
Claude Pro
$20/mo
ChatGPT Plus
$20/mo
Google AI Pro
$20/mo
Grok Premium+
$16/mo
Google is the only vendor bundling 2TB of cloud storage at this price point.
Developer API (High-IQ)
Claude Opus 4.7
$5.00 / 1M tokens
GPT-5.4
$2.50 / 1M tokens
Gemini 1.5 Pro
$1.25 / 1M tokens
Grok 4.20
$2.00 / 1M tokens
Gemini 1.5 Pro offers significantly lower input costs and a larger context window than Opus or GPT-5.4.
Entry-Level Paid
N/A
N/A
N/A
N/A
Google AI Plus
$10/mo
Grok Basic
$7/mo
Google and xAI are the only major vendors offering a sub-$20 tier for individual users.

🌳 Which Google product fits you?

3 questions, 1 recommendation
Are you building an application or using a chat interface?
Recommended
google gemini api
Best for developers who need pay-as-you-go flexibility and the 2M+ token context window.
See full profile ↑
Recommended
google ai plus
The most affordable way to get Gemini integrated directly into your Google Workspace apps.
See full profile ↑
Recommended
google ai pro
The standard flagship experience for individuals, including the 2TB Google One storage benefit.
See full profile ↑
Recommended
google ai ultra 5x + google ai ultra 20x
Recommended for power users who find themselves throttled by standard flagship usage limits.
See full profile ↑
Recommended
google ai ultra 20x
The highest-capacity consumer tier available for users with extreme throughput requirements.
See full profile ↑

Analysis of price trajectory based on recent changelog events and live rates.

API or subscription: which is cheaper for you?

Cross-over math at current rates

google-ai-plus ($7.99/mo) vs gemini-3-1-flash-lite API ($0.25/$1.5 per MTok)
Break-even: ~23,674 messages/month (avg ~600 tokens each)

At an average of 450 input and 150 output tokens per message, the API costs $0.0003375 per interaction. You must send over 23,000 messages to justify the $7.99 subscription on cost alone.

👉 API is significantly cheaper for casual users; Subscription is only for those needing the integrated UI/Workspace features.
google-ai-pro ($19.99/mo) vs gemini-3-1-pro API ($2/$12 per MTok)
Break-even: ~7,404 messages/month (avg ~600 tokens each)

With Gemini 3.1 Pro API rates, each 600-token message costs approximately $0.0027. The $19.99 subscription breaks even at roughly 247 messages per day.

👉 Power users chatting daily should choose the subscription; developers or light users will save 80%+ via API.
google-ai-ultra-20x ($199.99/mo) vs gemini-3-1-pro API ($2/$12 per MTok)
Break-even: ~74,070 messages/month (avg ~600 tokens each)

This high-tier subscription targets extreme volume. At $0.0027 per API message, you would need to process 74,000+ messages monthly to make the $199.99 flat fee cheaper than the API.

👉 Only viable for heavy research teams or automated workflows that require the consumer-facing interface over API integration.
Rule of thumb
Google's API pricing for Flash models is aggressively low, making the API the default choice for cost-efficiency unless you require the Gemini UI, 2TB of Drive storage, or Workspace integration.

🧮 Estimate your annual Google cost

Pick a profile, see the all-in annual estimate

All estimates use 2026-05-26 rates. API rates verified against LiteLLM.

Current pricing (all production models)

ModelInput $/MOutput $/MCached $/MContext
Gemini 3.1 Pro
gemini-3-1-pro
$2 $12 $0.20 2,000,000
Gemini 3 Flash
gemini-3-flash
$0.50 $3 $0.050 1,000,000
Gemini 3.1 Flash-Lite
gemini-3-1-flash-lite
$0.25 $1.5 $0.025 1,000,000
Gemini 2.5 Pro
gemini-2-5-pro
$1.25 $10 $0.13 2,000,000
Gemini 2.5 Flash
gemini-2-5-flash
$0.30 $2.5 $0.030 1,000,000
Gemini 2.5 Flash-Lite
gemini-2-5-flash-lite
$0.10 $0.40 $0.010 1,000,000

Pricing verified as of 2026-05-26. Caching discounts apply to repeated input tokens. Batch pricing typically offers a 50% discount on standard rates.

Full rate breakdown (all variants)

Variants beyond standard API: batch (async, 50% off), cached read (0.1x), cache writes (1.25x or 2x base), long-context tier (~2x above threshold).

Gemini 3.1 Pro gemini-3-1-pro

High-reasoning flagship for complex multi-modal agentic workflows
Primary useComplex reasoning, multi-step coding agents, and massive context analysis.
Who picks itTeams building production-grade agents requiring deep logic and 2M context.
Vs other Google modelsPriced at $2/$12, it is the premium tier compared to Flash's $0.5/$3, offering superior reasoning depth.
When to useUse for logic-heavy tasks where 2M context is required; step down to Flash for latency-sensitive pipelines.
Equivalents at other vendors
openai
GPT-5.4 Matches the high-reasoning tier with a slightly higher $2.5/$15 price point.
anthropic
Claude Opus 4.7 Competes in reasoning depth but at a significantly higher $5/$25 cost.
xai
Grok 4.20 (non-reasoning) Offers identical $2 input pricing with a more aggressive $6 output rate.

Gemini 3.1 Pro gemini-3-1-pro

VariantInput $/MOutput $/MNotes
Standard $2 $12 Default per-token API rate
Batch API $1 $6 Async batch processing, results within 24 hours, typically 50% off
Cached read $0.20 $12 Cached prompt input (~0.1x base); output rate unchanged
Long context (>200,000 tokens) $4 $18 Higher rate applies above 200,000 tokens

Gemini 3 Flash gemini-3-flash

Balanced speed and intelligence for high-throughput applications
Primary useReal-time chat, summarization, and document processing at scale.
Who picks itDevelopers scaling applications that need better-than-lite reasoning without pro costs.
Vs other Google modelsAt $0.5/$3, it offers a 4x price reduction from Pro while maintaining a 1M context window.
When to useBest for high-volume tasks needing moderate reasoning; use Pro if logic fails or Lite if cost is the only factor.
Equivalents at other vendors
openai
GPT-5.4 Mini Similar mid-tier positioning but with higher $0.75/$4.5 pricing.
cohere
Command R 08-2024 Matches the $0.5 input rate while offering a cheaper $1.5 output rate.
xai
Grok 4.3 Higher $1.25 input cost but comparable $2.5 output for similar speed tiers.

Gemini 3 Flash gemini-3-flash

VariantInput $/MOutput $/MNotes
Standard $0.50 $3 Default per-token API rate
Batch API $0.25 $1.5 Async batch processing, results within 24 hours, typically 50% off
Cached read $0.050 $3 Cached prompt input (~0.1x base); output rate unchanged

Gemini 3.1 Flash-Lite gemini-3-1-flash-lite

Ultra-low-cost efficiency for high-frequency simple tasks
Primary useHigh-volume classification, basic extraction, and simple chat interactions.
Who picks itStartups and enterprises optimizing for unit economics in simple automation.
Vs other Google modelsAt $0.25/$1.5, it is 50% cheaper than standard Flash for high-frequency, low-complexity calls.
When to useUse for high-velocity, low-logic tasks; switch to Flash if accuracy on nuanced instructions drops.
Equivalents at other vendors
openai
GPT-5.4 Nano Direct price competitor for lightweight tasks at $0.2/$1.25.
deepseek
DeepSeek V3.2 (chat) Similar $0.28 input cost but significantly lower $0.42 output rate.
xai
Grok 4.1 Fast (non-reasoning) Aggressive $0.2/$0.5 pricing targeting the same high-speed, low-cost segment.

Gemini 3.1 Flash-Lite gemini-3-1-flash-lite

VariantInput $/MOutput $/MNotes
Standard $0.25 $1.5 Default per-token API rate
Batch API $0.13 $0.75 Async batch processing, results within 24 hours, typically 50% off
Cached read $0.025 $1.5 Cached prompt input (~0.1x base); output rate unchanged

Gemini 2.5 Pro gemini-2-5-pro

Legacy high-capacity reasoning with massive 2M context window
Primary useLong-document analysis and stable reasoning for existing production pipelines.
Who picks itTeams maintaining 2.5-series integrations who require deep context windows.
Vs other Google modelsPriced at $1.25/$10, it is cheaper than 3.1 Pro ($2/$12) but lacks the latest architectural improvements.
When to useUse for cost-effective long-context reasoning; upgrade to 3.1 Pro for better instruction following.
Equivalents at other vendors
openai
GPT-5 Identical $1.25/$10 pricing for similar reasoning capability.
anthropic
Claude Opus 4.5 Higher reasoning tier but at a much higher $5/$25 price point.
cohere
Command R+ Matches the $10 output price but with a higher $2.5 input cost.

Gemini 2.5 Pro gemini-2-5-pro

VariantInput $/MOutput $/MNotes
Standard $1.25 $10 Default per-token API rate
Batch API $0.63 $5 Async batch processing, results within 24 hours, typically 50% off
Cached read $0.13 $10 Cached prompt input (~0.1x base); output rate unchanged
Long context (>200,000 tokens) $2.5 $15 Higher rate applies above 200,000 tokens

Gemini 2.5 Flash gemini-2-5-flash

Cost-effective speed for legacy 2.5-series deployments
Primary useFast response generation and medium-complexity data extraction.
Who picks itDevelopers optimizing 2.5-series apps for speed and moderate cost.
Vs other Google modelsAt $0.3/$2.5, it sits between 2.5 Pro and 2.5 Flash-Lite in both price and capability.
When to useUse for low-latency tasks in 2.5-series environments; move to 3 Flash for better performance.
Equivalents at other vendors
deepseek
deepseek-v4-pro Similar $0.435 input price with a much cheaper $0.87 output rate.
meta
llama-3-3-70b Higher $0.88 input cost but cheaper $0.88 output for similar throughput needs.
openai
GPT-5.4 Mini Higher $0.75/$4.5 pricing but serves the same speed-focused use cases.

Gemini 2.5 Flash gemini-2-5-flash

VariantInput $/MOutput $/MNotes
Standard $0.30 $2.5 Default per-token API rate
Batch API $0.15 $1.25 Async batch processing, results within 24 hours, typically 50% off
Cached read $0.030 $2.5 Cached prompt input (~0.1x base); output rate unchanged

Gemini 2.5 Flash-Lite gemini-2-5-flash-lite

Absolute lowest cost for basic 1M context tasks
Primary useMassive-scale simple data processing and basic routing.
Who picks itCost-sensitive developers running millions of simple, non-reasoning requests.
Vs other Google modelsThe cheapest in the lineup at $0.1/$0.4, offering 1M context at a fraction of Pro's cost.
When to useUse when cost is the primary constraint for simple tasks; move to 3.1 Flash-Lite for better quality.
Equivalents at other vendors
mistral
Mistral Small 4 Identical $0.1 input price with a slightly cheaper $0.3 output rate.
cohere
Command R Similar price tier at $0.15/$0.6 for lightweight, high-volume tasks.
meta
llama-3-8b-instruct-lite Identical $0.1 input price with a cheaper $0.1 output rate.

Gemini 2.5 Flash-Lite gemini-2-5-flash-lite

VariantInput $/MOutput $/MNotes
Standard $0.10 $0.40 Default per-token API rate
Batch API $0.050 $0.20 Async batch processing, results within 24 hours, typically 50% off
Cached read $0.010 $0.40 Cached prompt input (~0.1x base); output rate unchanged

Subscription plans (consumer + business)

PlanAudienceMonthlyAnnualPer seatWhat's included
Google AI Plus
Google AI
consumer $7.99 Priority New Features
Limits: storage gb: 200 · notebooklm size: large · family sharing seats: 5 · gemini usage multiplier: 2x · google flow credits monthly: 200
one.google.com ↗
Google AI Pro
Google AI
consumer $19.99 $199.99/yr
(≈ $16.67/mo)
Youtube Premium: lite · Priority New Features
Limits: storage gb: 5000 · notebooklm size: larger · family sharing seats: 5 · context window tokens: 1000000 · gemini usage multiplier: 4x · google flow credits monthly: 1000
one.google.com ↗
Google AI Ultra 5x
Google AI
consumer $99.99 Early Access · Youtube Premium: individual · Priority New Features
Limits: storage gb: 20000 · notebooklm size: largest · family sharing seats: 5 · context window tokens: 1000000 · gemini usage multiplier: 5x_pro · google flow credits monthly: 10000
one.google.com ↗
Google AI Ultra 20x
Google AI
consumer $199.99 Early Access · Youtube Premium: individual · Priority New Features
Limits: storage gb: 30000 · notebooklm size: largest · family sharing seats: 5 · context window tokens: 1000000 · gemini usage multiplier: 20x_pro · google flow credits monthly: 25000
one.google.com ↗

Subscription pricing is separate from per-token API rates above.

What changed in the last 30-90 days

How buyers think about Google pricing

Each scenario below is interactive — tweak the inputs to see how the math changes for your workload.

Cheapest Gemini tier for high-volume tasks

vibe-codersolopreneurdeveloper

The problem: You need to process millions of simple classification or extraction tasks without the bill scaling faster than your revenue. High-performance models are overkill for basic data cleaning or routing.

What to do: Use Gemini 2.5 Flash-Lite for the lowest possible entry point or Gemini 3.1 Flash-Lite for newer architecture at a slight premium.

Processing 10 million input tokens and 2 million output tokens on Gemini 2.5 Flash-Lite costs $1.00 for input (10M x $0.1/M) and $0.80 for output (2M x $0.4/M), totaling $1.80. The same volume on Gemini 3.1 Flash-Lite costs $2.50 for input (10M x $0.25/M) and $3.00 for output (2M x $1.5/M), totaling $5.50 (as of 2026-05-26).

→ Gemini 2.5 Flash-Lite provides a baseline cost of $0.50 per million balanced tokens (as of 2026-05-26).

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

How far the free AI Studio tier actually goes

vibe-codersolopreneurdeveloper

The problem: You want to prototype without entering credit card details or committing to a Google Cloud project. You need to know when the rate limits will force a migration to a paid plan.

What to do: Leverage the AI Studio free tier for development and testing before switching to Vertex AI for production scaling.

While the free tier offers no-cost access, production workloads requiring more than the standard rate limits must transition to pay-as-you-go. For example, moving a small app that uses 500K input tokens and 100K output tokens daily to Gemini 3 Flash would cost $0.25 for input (0.5M x $0.5/M) and $0.30 for output (0.1M x $3/M), totaling $0.55 per day or roughly $16.50 per month (as of 2026-05-26).

→ Prototyping is free, but production-ready stability on Gemini 3 Flash starts at approximately $0.55 per 600K tokens (as of 2026-05-26).

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

Using the 1M context window without breaking budget

developersmbenterprise

The problem: Running RAG-heavy workflows or analyzing massive documents can lead to massive input costs if you send the same 1-million-token context with every query.

What to do: Utilize Vertex AI Context Caching to reduce the cost of repetitive input tokens by 90%.

Sending a 1-million-token document to Gemini 3.1 Pro costs $2.00 per request (1M x $2/M). By using context caching, the cost for subsequent requests drops to the cached rate of $0.20 per million tokens (as of 2026-05-26). Over 100 queries, this reduces the input cost from $200 to $21.80, including the initial cache fill.

→ Context caching on Gemini 3.1 Pro reduces repetitive input costs from $2.00 to $0.20 per million tokens (as of 2026-05-26).

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

Vertex AI vs AI Studio when does each make sense

it-buyerenterprisedeveloper

The problem: You are choosing between the developer-friendly AI Studio and the enterprise-grade Vertex AI platform. You need to know if the extra features justify the potential complexity.

What to do: Choose Vertex AI for workloads that require Google Cloud Enterprise Discount Programs (EDP) or Provisioned Throughput.

Vertex AI allows you to count Gemini usage toward a Customer Annual Spend Commitment (CASC), which can trigger discounts of 35-40% for standard tiers (as of 2026-05-26). For an enterprise spending $10,000 monthly on Gemini 3.1 Pro tokens, an EDP could potentially reduce that bill to $6,000-$6,500 depending on the negotiated sleeve.

→ Vertex AI is the only path to stackable discounts that can reduce token costs by up to 40% (as of 2026-05-26).

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

When Gemini 3.1 Pro is worth the price over 2.5 Pro

developerit-buyerenterprise

The problem: You need to decide if the intelligence gains in the 3.1 series justify the higher price point compared to the 2.5 series for complex reasoning tasks.

What to do: Use Gemini 2.5 Pro for standard high-intelligence tasks and reserve 3.1 Pro for multi-step reasoning that fails on older models.

Processing 1 million input and 1 million output tokens on Gemini 2.5 Pro costs $11.25 ($1.25 + $10). Upgrading to Gemini 3.1 Pro increases the cost to $14.00 ($2 + $12) per million tokens (as of 2026-05-26). This represents a 24% price increase for the newer architecture.

→ Gemini 3.1 Pro carries a $2.75 premium per million balanced tokens over Gemini 2.5 Pro (as of 2026-05-26).

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

Gemini Advanced inside Google Workspace

smbenterprise

The problem: You want to provide AI tools to your employees but are unsure whether to buy individual Gemini Advanced subscriptions or use the Workspace AI add-on.

What to do: Compare the cost of the Workspace AI add-on against standalone API usage for internal productivity tools.

While API pricing for Gemini 3.1 Pro is $2 per million input tokens, Workspace users can access Gemini features via a per-seat monthly add-on. For a team of 50, if each user processes the equivalent of 5 million tokens monthly, the API cost would be $500 for input alone (250M x $2/M). Verify with your Google account manager if the Workspace seat cost is lower than your projected API consumption (as of 2026-05-26).

→ High-volume internal users may find Workspace seat pricing more predictable than variable API token billing (as of 2026-05-26).

Quick calc — adjust for your workload
Per request:  ·  Monthly:  ·  Annual:
Open full calculator with caching, batch, charts →

Volume discounts & partner programs

Heads up — these are community-sourced and analyst-reported terms. Specific credit amounts, discount percentages, and program thresholds change frequently. Always verify current terms directly with Google before relying on a specific number. Treat reported figures as ballpark, not contract language.

Google Cloud Partner Network (2026 Tier Structure)

Threshold: Select ($250k ACV); Premier ($2M ACV); Diamond ($20M ACV)

Typical discount (reported): 8–12% typical reseller margin; up to 15-20% for $5M/3yr commits

Benefits:

How to engage: Apply via Google Cloud Partner Advantage portal; transition window for 2026 program began in Q1 2026

Source: crn.com.aucommunity · cited 2026-01-22

Vertex AI Provisioned Throughput (PT)

Threshold: Measured in Generative AI Scale Units (GSUs); minimums vary by model

Typical discount (reported): Fixed-cost subscription; break-even typically 12-15% of capacity sustained

Benefits:

How to engage: Purchase via Provisioned Throughput dashboard in Vertex AI console

Source: cloud.google.comvendor_official · cited 2026-02-19

Google Cloud Enterprise Discount Program (EDP) / CASC

Threshold: Typically $150k+ for custom pricing; $1M-$3M for standard EDP tiers

Typical discount (reported): 35-40% initial offer; reportedly up to high 80% for $1B+ contracts

Benefits:

How to engage: Direct negotiation with Google Cloud sales account teams

Source: magicmag.aianalyst_report · cited 2026-02-18

Azure AI Foundry Provisioned Throughput Reservations

Threshold: Purchased in Provisioned Throughput Units (PTUs)

Typical discount (reported): Up to 70% compared to hourly pay-as-you-go

Benefits:

How to engage: Navigate to Reservations section in Azure Portal

Source: techcommunity.microsoft.comvendor_official · cited 2025-05-19

Amazon Bedrock Provisioned Throughput

Threshold: Minimum 1 model unit (MU) commitment

Typical discount (reported): Varies by term; 6-month commitments offer deepest discounts

Benefits:

How to engage: Purchase through AWS Bedrock console under 'Provisioned Throughput'

Source: cloudforecast.ioanalyst_report · cited 2024-10-31

Vertex AI Batch Prediction & Caching Discounts

Threshold: Workload-based (non-real-time)

Typical discount (reported): 50% off for Batch API; 90% off for Context Caching

Benefits:

How to engage: Enable via Vertex AI API parameters (e.g., setting 'caching' or using Batch Prediction jobs)

Source: cloudzero.comanalyst_report · cited 2026-05-04

Multi-cloud availability

Cloud-marketplace terms change frequently. Model availability dates, pricing parity, and regional features can drift week to week. Verify with each cloud's pricing page (AWS Bedrock, Google Vertex, Azure AI Foundry) before architecting around specifics.
CloudModel availabilityPrice vs vendor-directReasons to pick
AWS Bedrock Gemma 3 (4B, 12B, 27B) and Gemma 4 (via Hugging Face collection) varies by deployment (serverless pay-as-you-go or marketplace subscription)
  • Multi-model flexibility through a unified API
  • Deep integration with AWS infrastructure (S3, IAM, CloudTrail)
  • Enterprise-grade security and compliance (HIPAA, FedRAMP)
  • Access to 100+ serverless models from multiple providers

aws.amazon.com ↗
Azure AI Foundry Gemma 4 (variants including E2B, E4B, 26B A4B, 31B) reportedly based on managed compute (VM/GPU hourly rates) or serverless pay-as-you-go
  • Seamless integration with Microsoft 365 and Azure ecosystem
  • Unified platform for building, testing, and managing AI applications
  • Support for multi-model strategies within a single control plane
  • Enterprise-segment offerings with robust governance and safety tools

techcommunity.microsoft.com ↗
Together.ai Gemma 3 27B reportedly $0.06 per 1M input tokens and $0.12 per 1M output tokens
  • High-performance optimizations (reportedly 3.5x faster inference)
  • Competitive pricing with a 50% discount for Batch API workloads
  • Pure consumption model with no setup fees or minimum commitments
  • Wide selection of 100+ open-source models

computeprices.com ↗
Anyscale Gemma 7B (via LiteLLM/Anyscale endpoints) approximately $0.15 per 1M input and $0.15 per 1M output tokens
  • Multi-cloud execution across AWS, GCP, and Azure
  • Distributed computing powered by the Ray framework
  • Reported cloud cost reductions of up to 50% through optimized GPU utilization
  • Unified Python programming model for the entire ML lifecycle

litellm.ai ↗
Google Vertex AI (Vendor-Direct) Gemini 3.1 Pro, Gemini 3.5 Flash, Gemini 2.5 Pro/Flash, Gemma 4 baseline (e.g., Gemini 3.1 Pro at $2.00/$12.00 per 1M tokens for context <= 200K)
  • Native access to the full Gemini model family
  • Industry-leading context windows (up to 2 million tokens)
  • Native grounding with Google Search and BigQuery integration
  • Comprehensive MLOps tooling (Pipelines, Experiments, Model Monitoring)

cloud.google.com ↗

Free credits & startup programs

Program details and credit amounts shift often. Apply directly through each program's official page for current values, eligibility windows, and application requirements.

Google for Startups Cloud Program - AI Track

Reported value: up to $350,000 in credits over 2 years

Eligibility: AI-first startups from Seed to Series A (Series A must be raised within the last 12 months); founded within the last 10 years; not received more than $5,000 in previous Google Cloud credits.

How to apply: Apply through the Google for Startups website or via an approved partner (VC, accelerator, or incubator).

Apply / learn more at cloud.google.com ↗

Google for Startups Cloud Program - Scale Tier

Reported value: up to $200,000 in credits over 2 years

Eligibility: Startups with verified equity funding from pre-seed to Series A (Series A raised within last 12 months); founded within the last 10 years; not received more than $5,000 in previous Google Cloud credits.

How to apply: Submit application on the Google for Startups Cloud Program page; requires verification of institutional funding.

Apply / learn more at cloud.google.com ↗

Google for Startups Cloud Program - Start Tier

Reported value: $2,000 in credits for 12 months

Eligibility: Technology startups not yet funded by an institutional investor; founded within the last 5 years; not received previous credits beyond the free trial.

How to apply: Apply directly on the Google for Startups website.

Apply / learn more at cloud.google.com ↗

Google Cloud Research Credits (Faculty & Postdocs)

Reported value: up to $5,000 in credits

Eligibility: Faculty and postdoctoral researchers at higher education institutions in eligible countries; requires a research proposal and cost estimate.

How to apply: Submit an online application form including a research proposal and Google Cloud billing account details.

Apply / learn more at edu.google.com ↗

Google Cloud Research Credits (PhD Students)

Reported value: up to $1,000 in credits per year

Eligibility: PhD students conducting research at educational institutions; must be used for the described project and not personal use.

How to apply: Apply via the Google for Education research credits application form; can apply once per year.

Apply / learn more at edu.google.com ↗

NVIDIA Inception & Google Cloud Collaboration

Reported value: up to $350,000 in Google Cloud credits

Eligibility: Qualified members of the NVIDIA Inception program focused on AI.

How to apply: Members of NVIDIA Inception can access an accelerated path to the Google for Startups Cloud Program through the NVIDIA member portal.

Apply / learn more at nvidianews.nvidia.com ↗

Y Combinator Summer Grants 2026

Reported value: $90,000 in compute credits (shared across AWS, Azure, and GCP)

Eligibility: Technical college students building AI or technical projects full-time in San Francisco during Summer 2026.

How to apply: Apply via the Y Combinator Summer Grants application page; rolling admissions.

Apply / learn more at ycombinator.com ↗

Google for Startups Accelerator: AI First

Reported value: equity-free support and free Cloud TPU access

Eligibility: Seed to Series A AI-first startups based in North America; must commit CTO or technical leads to program sessions.

How to apply: Apply for specific cohorts via the Google for Startups Accelerator program page.

Apply / learn more at startup.google.com ↗

Pricing gotchas to watch

Most gotchas below were surfaced by community reports. Some may have been fixed, changed, or never been the user-facing issue they appeared. Verify against current vendor docs before architecting around a workaround.

Explicit Cache Deletion Billing Trap

When using explicit context caching, users are reportedly billed for the originally specified Time to Live (TTL) duration even if the cache is manually deleted before it expires. For example, a cache created with a 1-hour TTL that is deleted after 15 minutes still incurs the full 1-hour storage charge.

Workaround: Instead of deleting the cache, update the TTL to a shorter duration (e.g., 30 minutes) to adjust the billing period before the cache naturally expires.

Source: developers.google.comvendor_docs · cited 2026-05-26

Implicit Caching Sparse-Traffic Loss

Implicit caching (automatically enabled for Gemini 2.5+) offers no cost-saving guarantee. It is described as an ephemeral optimization layer where data is retained only for a 'defined short retention period' or session lifetime. Sparse traffic patterns often result in cache misses, forcing production users to pay full price for repeated input tokens.

Workaround: For production workloads requiring guaranteed savings, use explicit caching which allows manual TTL management, provided the prompt meets the minimum threshold (typically 1,024 to 4,096 tokens depending on the model).

Source: blog.googleblog_post · cited 2026-05-26

Surprise 'Ghost' Charges from Idle Endpoints

Vertex AI online prediction endpoints charge hourly fees (starting at approximately $0.75 per node-hour) even when idle. Production users have reported 'ghost' charges reaching hundreds of dollars because undeploying a model is reportedly insufficient; the endpoint resource itself must be deleted to stop GPU/compute allocation billing.

Workaround: Implement automated scripts to delete idle endpoints rather than just undeploying models, and use Vertex AI Batch API for non-real-time tasks to avoid 24/7 endpoint costs.

Source: cloud.google.comvendor_docs · cited 2026-05-26

Multimodal Token Counting Discrepancies

Gemini 2.0 image tokenization uses a tiling logic where images larger than 384px are scaled into 768x768 tiles, each costing 258 tokens. However, production users report surprises where a single high-resolution image (e.g., 1920x1080) can result in over 1,800 tokens, significantly higher than a simple 4-tile calculation would suggest.

Workaround: Pre-process and downscale images to 384x384 pixels before sending them to the API to ensure they stay within the minimum 258-token billing tier.

Source: developers.google.comvendor_docs · cited 2026-05-26

Regional Pricing and Language Variance

Vertex AI pricing for Gemini models reportedly varies by region, with non-US regions typically costing 2% to 5% more than us-central1. Additionally, because Vertex AI bills per character rather than per token, Japanese-language deployments can be approximately 3x cheaper per token than English due to higher information density per character.

Workaround: Calculate costs based on character counts for the specific target language rather than token estimates to avoid budget overruns in Latin-script languages.

Source: cloud.google.comvendor_docs · cited 2026-05-26

Legacy Key Auto-Upgrade Billing Risk

A major security-related pricing gotcha involves legacy Google Maps API keys. If these keys reside in a project where Gemini is enabled, they are reportedly 'silently' upgraded to allow Gemini API access. Attackers exploiting these publicly exposed keys have generated unauthorized bills ranging from $10,000 to over $180,000 in a few days.

Workaround: Explicitly restrict all API keys to specific services (e.g., only Maps) and set project-level spend caps, as budget alerts do not stop usage.

Source: trufflesecurity.comblog_post · cited 2026-05-26

Hidden costs (25-40% beyond per-token rates)

Typical overhead: 25-40% beyond raw per-token rates.

What it costs to leave Google

Migrating away from Google involves moving out of the Vertex AI ecosystem and potentially losing access to the 2-million-token context window. While Gemma models offer an open-weight path for portability, the native grounding integrations with Google Search and BigQuery create significant functional lock-in.

Who is this for?

For vibe coders & solo devs

For rapid prototyping, AI Studio is your best friend because it bypasses the complexity of Google Cloud project setup. You should start with Gemini 3.1 Flash-Lite to keep costs extremely low while benefiting from the latest architecture. If you hit rate limits, moving to Vertex AI is straightforward but requires managing service accounts. Focus on using the free tier for dev and only pay for production traffic.

* Use AI Studio for zero-config API keys.
* Start with Gemini 3.1 Flash-Lite for $0.25/M input tokens.
* Monitor your usage to avoid the 'Legacy Key' billing trap.
* Switch to Vertex AI only when you need enterprise scaling or logging.

For SMBs and growing teams

Small businesses should look at the 50% discount offered by the Vertex AI Batch API for non-real-time tasks like content generation or report summarization. If you have repetitive tasks, context caching is mandatory to avoid paying for the same data twice. Consider the Google for Startups Cloud Program if you are eligible, as it can provide up to $200,000 in credits. This can effectively eliminate your AI spend for the first two years.

* Apply for the Google for Startups Scale Tier for $200,000 in credits.
* Use Batch API to save 50% on high-volume summarization.
* Enable context caching for any prompt over 1,024 tokens.
* Set strict project-level spend caps to prevent budget overruns.

For enterprise buyers

Enterprises should leverage the Enterprise Discount Program (EDP) to ensure Vertex AI spend counts toward broader cloud commitments. If you require guaranteed latency, Provisioned Throughput (PT) is the preferred route, though it requires a commitment of at least one week. For global deployments, be aware that regional pricing can increase costs by 2-5% outside of the US. Use Diamond tier partner benefits if your ACV exceeds $20M for maximum support.

* Negotiate 'Vertex-specific discount sleeves' within your EDP.
* Use Provisioned Throughput for mission-critical, high-traffic apps.
* Deploy in us-central1 to avoid the 2-5% regional price premium.
* Delete idle endpoints entirely to avoid the $0.75 per node-hour 'ghost' charge.
Need help deciding which Google tier or model fits your workload? Book a $19.99 quick call →

Sources verified for this page

Primary: Google pricing page

View all 25 cited insider sources across 16 domains

Generator: gen-v5.0.8-2026-05-25 · Last refreshed: Mon May 25 2026 21:00:26 GMT-0400 (Eastern Daylight Time) · Pricing snapshot: Mon May 25 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-05.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-05
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 578 days captured
Anthropic Docs
2026-06-05
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 578 days captured
OpenAI
2026-06-05
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 579 days captured
Google AI
2026-06-05
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 554 days captured
Google Vertex
2026-06-05
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 554 days captured
DeepSeek
2026-06-05
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 493 days captured
xAI
2026-06-05
https://x.ai/api
Daily snapshot since Nov 2024 · 411 days captured
Mistral
2026-06-05
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 552 days captured
Cohere
2026-06-05
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 578 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →