Google pricing, complete breakdown
Verified 2026-05-26, cross-checked against Google pricing page, litellm, openrouter
Google's model lineup spans from the flagship Gemini 3.1 Pro at $2.00 per million input tokens to the highly efficient Gemini 2.5 Flash-Lite at just $0.10 per million. Developers can leverage the high-speed Gemini 3 Flash for $0.50/M input or the ultra-low-cost Gemini 3.1 Flash-Lite at $0.25/M input. These models support massive context windows up to 2 million tokens, with significant discounts for cached inputs. This guide helps you navigate the trade-offs between per-token API costs and fixed-rate consumer subscriptions.
How Google's pricing universe works
Google manages a dual-track ecosystem to capture both high-volume developer traffic and steady consumer recurring revenue. The API track is designed for builders who need granular, metered control over token usage and programmatic access to frontier models. Conversely, the subscription track targets end-users who value predictable monthly costs and integrated features like Google Flow credits and Gmail AI tools. This multi-modal approach allows Google to monetize the same underlying models through different margins and customer acquisition costs depending on the user's technical needs.
API (per-token, metered)
- Pay only for tokens consumed
- Full model lineup including batch, caching, long context
- Programmatic via SDKs
Consumer subscriptions (Plus, Pro, Ultra tiers)
- Fixed monthly fee
- Generous usage caps
- Web/desktop/mobile apps
- Often includes newer models first
Business/Team plans
- Per-seat billing
- Centralized billing
- Admin & audit controls
- Sometimes shared usage pools
Enterprise (custom contract)
- Custom pricing and limits
- SLAs
- DPAs and BAAs
- Dedicated support
- Sometimes private cloud / VPC
Cloud marketplaces (Google Vertex AI)
- Same models, slightly different pricing (often parity or small premium)
- Counts toward existing cloud spend commits
- Stays within cloud's data-protection boundary
⭐ Most popular Google products by user type
🎁 Current promos and time-sensitive deals
📅 What changed in the last 30 days
New model added: Gemini 2.5 Flash-Lite Preview — inserted by aicost-merge-new-models
New model added: Gemini 3.1 Flash TTS Preview — inserted by aicost-merge-new-models
New model added: Gemini 3.1 Flash Live Preview — inserted by aicost-merge-new-models
New model added: Gemini 3.1 Flash-Lite Preview — inserted by aicost-merge-new-models
New model added: Gemini 3 Flash Preview — inserted by aicost-merge-new-models
New model added: Gemini 3.5 Flash — inserted by aicost-merge-new-models
New model added: Gemini 3.1 Pro Preview — inserted by aicost-merge-new-models
New model added: Gemini 3.1 Pro Preview (CustomTools) — inserted by aicost-merge-new-models
12.000000 → 4.500000
3.000000 → 0.750000
0.500000 → 1.000000
10.000000 → 20.000000
0.250000 → 0.500000
5.000000 → 10.000000
3.000000 → 0.750000
12.000000 → 4.500000
3.000000 → 0.750000
12.000000 → 4.500000
Now 0.500000
3.000000 → 0.500000
12.000000 → 2.000000
3.000000 → 0.750000
20.000000 → 10.000000
Now 0.200000
Now 200000.000000
0.100000 → 0.050000
0.250000 → 0.100000
Now 0.200000
Now 200000.000000
12.000000 → 4.500000
⏳ Major upcoming changes
Every Google product, profiled
For each product, what it's for, who picks it, what to watch out for, pros and cons, and what we tell our consulting clients.
Google AI Plus
- Drafting emails in Gmail
- Summarizing long Google Docs
- Basic creative writing and brainstorming
- Simple data organization in Sheets
- Gemini integration in Gmail and Docs
- Access to Gemini Flash models
- Standard response speeds
- Basic image generation capabilities
- Affordable entry point for integrated AI
- Seamless workflow within Google Workspace
- No complex setup required
- No annual discount available
- Limited to lower-tier models
- No extra cloud storage included
| Scenario | Monthly | Annual | Notes |
|---|---|---|---|
| Standard individual use | $7.99 | $95.88 | Calculated as 12 months at the standard rate. |
Google AI Pro
- Advanced coding and debugging
- Complex multi-step reasoning tasks
- High-resolution AI image generation
- Analyzing large datasets in Sheets
- Access to Gemini 3 Pro and Ultra models
- 2TB Google One storage included
- Priority access during peak times
- Advanced Gemini in Workspace features
- 1M+ token context window support
- Best-in-class context window for long documents
- Excellent value if you already need cloud storage
- Fastest response times for flagship models
- Expensive if you don't use the 2TB storage
- Workspace integration can sometimes be buggy
- Rate limits still exist despite 'Pro' branding
| Scenario | Monthly | Annual | Notes |
|---|---|---|---|
| Annual commitment | $16.67 | $199.99 | Paid as a single upfront payment. |
| Month-to-month flexibility | $19.99 | $239.88 | Total cost if paid monthly for a full year. |
Google AI Ultra 5x
- Continuous coding assistance
- Batch processing of large document sets
- High-frequency image/video generation
- Extensive RAG-based research
- 5x higher rate limits for Gemini Ultra
- Priority technical support
- Early access to experimental 2M+ context windows
- All features of the Pro tier included
- Significantly reduces 'rate limit' interruptions
- Ideal for power users who 'live' in the chat interface
- Fastest access to new model iterations
- Very high price jump from Pro tier
- No annual discount offered
- Lacks true enterprise-grade team management
| Scenario | Monthly | Annual | Notes |
|---|---|---|---|
| Year-round heavy usage | $99.99 | $1,199.88 | No annual discount available for this tier. |
Google AI Ultra 20x
- Near-constant AI interaction
- Large-scale content production pipelines
- Extensive testing of long-context prompts
- Replacing multiple specialized AI tools
- 20x higher rate limits than Pro
- Highest priority in the model queue
- Direct feedback channel to product teams
- Full Workspace integration with max capacity
- Virtually eliminates rate-limiting for human users
- Best possible latency during global traffic spikes
- Access to the absolute cutting edge of Google AI
- Extremely expensive for a consumer account
- No team/seat management features
- No annual billing option
| Scenario | Monthly | Annual | Notes |
|---|---|---|---|
| Maximum consumer spend | $199.99 | $2,399.88 | The highest possible cost for a non-enterprise Google AI account. |
Gemini API / Vertex AI
- Building custom AI applications
- Large-scale data classification
- Automated customer support bots
- Long-context document analysis (up to 2M tokens)
- Access to Gemini 3.1 Pro and Flash models
- Context Caching (90% discount on input tokens)
- Batch API (50% discount for non-real-time tasks)
- Provisioned Throughput for guaranteed capacity
- Vertex AI enterprise security and IAM
- Most cost-effective way to handle massive contexts
- Enterprise-grade data privacy (Vertex AI)
- Highly flexible pricing with Batch and Caching
- Requires technical expertise to implement
- Billing can be unpredictable without caps
- Provisioned Throughput requires significant commitments
| Scenario | Monthly | Annual | Notes |
|---|---|---|---|
| Small startup (10M Pro tokens/mo) | $140 | $1,680 | Assumes 10M input ($20) and 10M output ($120) on Gemini 3.1 Pro. |
| High-volume Flash user (100M tokens/mo) | $350 | $4,200 | Assumes 100M input ($50) and 100M output ($300) on Gemini 3 Flash. |
All Google products at a glance
Scroll up to the product profile for full detail
| Product | Price | Best for | Headline feature | Yearly estimate |
|---|---|---|---|---|
| Google AI Plus | $10/mo | Basic Workspace AI | Docs/Gmail Integration | $120 |
| Google AI Pro | $20/mo | Individual Flagship Use | 2TB Google One Storage | $240 |
| Google AI Ultra 5x | $50/mo | High-volume individuals | 5x Usage Limits | $600 |
| Google AI Ultra 20x | $150/mo | Extreme usage/Small teams | 20x Usage Limits | $1,800 |
| Gemini API | Usage-based | App Development | 2M+ Context Window | Variable |
Google vs the field
Same-tier comparison across top 5 vendors
| Comparison tier | Anthropic | OpenAI | xAI | Verdict | |
|---|---|---|---|---|---|
| Consumer Flagship | Claude Pro $20/mo |
ChatGPT Plus $20/mo |
Google AI Pro $20/mo |
Grok Premium+ $16/mo |
Google is the only vendor bundling 2TB of cloud storage at this price point. |
| Developer API (High-IQ) | Claude Opus 4.7 $5.00 / 1M tokens |
GPT-5.4 $2.50 / 1M tokens |
Gemini 1.5 Pro $1.25 / 1M tokens |
Grok 4.20 $2.00 / 1M tokens |
Gemini 1.5 Pro offers significantly lower input costs and a larger context window than Opus or GPT-5.4. |
| Entry-Level Paid | N/A N/A |
N/A N/A |
Google AI Plus $10/mo |
Grok Basic $7/mo |
Google and xAI are the only major vendors offering a sub-$20 tier for individual users. |
🌳 Which Google product fits you?
How Google pricing has moved
Analysis of price trajectory based on recent changelog events and live rates.
API or subscription: which is cheaper for you?
Cross-over math at current rates
At an average of 450 input and 150 output tokens per message, the API costs $0.0003375 per interaction. You must send over 23,000 messages to justify the $7.99 subscription on cost alone.
With Gemini 3.1 Pro API rates, each 600-token message costs approximately $0.0027. The $19.99 subscription breaks even at roughly 247 messages per day.
This high-tier subscription targets extreme volume. At $0.0027 per API message, you would need to process 74,000+ messages monthly to make the $199.99 flat fee cheaper than the API.
Current pricing (all production models)
| Model | Input $/M | Output $/M | Cached $/M | Context |
|---|---|---|---|---|
Gemini 3.1 Progemini-3-1-pro |
$2 | $12 | $0.20 | 2,000,000 |
Gemini 3 Flashgemini-3-flash |
$0.50 | $3 | $0.050 | 1,000,000 |
Gemini 3.1 Flash-Litegemini-3-1-flash-lite |
$0.25 | $1.5 | $0.025 | 1,000,000 |
Gemini 2.5 Progemini-2-5-pro |
$1.25 | $10 | $0.13 | 2,000,000 |
Gemini 2.5 Flashgemini-2-5-flash |
$0.30 | $2.5 | $0.030 | 1,000,000 |
Gemini 2.5 Flash-Litegemini-2-5-flash-lite |
$0.10 | $0.40 | $0.010 | 1,000,000 |
Pricing verified as of 2026-05-26. Caching discounts apply to repeated input tokens. Batch pricing typically offers a 50% discount on standard rates.
Full rate breakdown (all variants)
Variants beyond standard API: batch (async, 50% off), cached read (0.1x), cache writes (1.25x or 2x base), long-context tier (~2x above threshold).
Gemini 3.1 Pro gemini-3-1-pro
Gemini 3.1 Pro gemini-3-1-pro
| Variant | Input $/M | Output $/M | Notes |
|---|---|---|---|
| Standard | $2 | $12 | Default per-token API rate |
| Batch API | $1 | $6 | Async batch processing, results within 24 hours, typically 50% off |
| Cached read | $0.20 | $12 | Cached prompt input (~0.1x base); output rate unchanged |
| Long context (>200,000 tokens) | $4 | $18 | Higher rate applies above 200,000 tokens |
Gemini 3 Flash gemini-3-flash
Gemini 3 Flash gemini-3-flash
| Variant | Input $/M | Output $/M | Notes |
|---|---|---|---|
| Standard | $0.50 | $3 | Default per-token API rate |
| Batch API | $0.25 | $1.5 | Async batch processing, results within 24 hours, typically 50% off |
| Cached read | $0.050 | $3 | Cached prompt input (~0.1x base); output rate unchanged |
Gemini 3.1 Flash-Lite gemini-3-1-flash-lite
Gemini 3.1 Flash-Lite gemini-3-1-flash-lite
| Variant | Input $/M | Output $/M | Notes |
|---|---|---|---|
| Standard | $0.25 | $1.5 | Default per-token API rate |
| Batch API | $0.13 | $0.75 | Async batch processing, results within 24 hours, typically 50% off |
| Cached read | $0.025 | $1.5 | Cached prompt input (~0.1x base); output rate unchanged |
Gemini 2.5 Pro gemini-2-5-pro
Gemini 2.5 Pro gemini-2-5-pro
| Variant | Input $/M | Output $/M | Notes |
|---|---|---|---|
| Standard | $1.25 | $10 | Default per-token API rate |
| Batch API | $0.63 | $5 | Async batch processing, results within 24 hours, typically 50% off |
| Cached read | $0.13 | $10 | Cached prompt input (~0.1x base); output rate unchanged |
| Long context (>200,000 tokens) | $2.5 | $15 | Higher rate applies above 200,000 tokens |
Gemini 2.5 Flash gemini-2-5-flash
Gemini 2.5 Flash gemini-2-5-flash
| Variant | Input $/M | Output $/M | Notes |
|---|---|---|---|
| Standard | $0.30 | $2.5 | Default per-token API rate |
| Batch API | $0.15 | $1.25 | Async batch processing, results within 24 hours, typically 50% off |
| Cached read | $0.030 | $2.5 | Cached prompt input (~0.1x base); output rate unchanged |
Gemini 2.5 Flash-Lite gemini-2-5-flash-lite
Gemini 2.5 Flash-Lite gemini-2-5-flash-lite
| Variant | Input $/M | Output $/M | Notes |
|---|---|---|---|
| Standard | $0.10 | $0.40 | Default per-token API rate |
| Batch API | $0.050 | $0.20 | Async batch processing, results within 24 hours, typically 50% off |
| Cached read | $0.010 | $0.40 | Cached prompt input (~0.1x base); output rate unchanged |
Subscription plans (consumer + business)
| Plan | Audience | Monthly | Annual | Per seat | What's included |
|---|---|---|---|---|---|
|
Google AI Plus
Google AI |
consumer | $7.99 | — | — |
Priority New Features
Limits: storage gb: 200 · notebooklm size: large · family sharing seats: 5 · gemini usage multiplier: 2x · google flow credits monthly: 200 one.google.com ↗ |
|
Google AI Pro
Google AI |
consumer | $19.99 | $199.99/yr (≈ $16.67/mo) |
— |
Youtube Premium: lite · Priority New Features
Limits: storage gb: 5000 · notebooklm size: larger · family sharing seats: 5 · context window tokens: 1000000 · gemini usage multiplier: 4x · google flow credits monthly: 1000 one.google.com ↗ |
|
Google AI Ultra 5x
Google AI |
consumer | $99.99 | — | — |
Early Access · Youtube Premium: individual · Priority New Features
Limits: storage gb: 20000 · notebooklm size: largest · family sharing seats: 5 · context window tokens: 1000000 · gemini usage multiplier: 5x_pro · google flow credits monthly: 10000 one.google.com ↗ |
|
Google AI Ultra 20x
Google AI |
consumer | $199.99 | — | — |
Early Access · Youtube Premium: individual · Priority New Features
Limits: storage gb: 30000 · notebooklm size: largest · family sharing seats: 5 · context window tokens: 1000000 · gemini usage multiplier: 20x_pro · google flow credits monthly: 25000 one.google.com ↗ |
Subscription pricing is separate from per-token API rates above.
What changed in the last 30-90 days
- 2026-05-25: Massive expansion of Gemini 3.1 and 3.5 preview models, including Gemini 3.1 Pro, Gemini 3.5 Flash, and specialized Flash-Lite variants. — Developers now have access to a broader range of performance-to-cost ratios for testing next-generation applications.
- 2026-05-25: Gemini 3.1 Flash Live Preview prices slashed: input dropped 75% to $0.75/M and output dropped 62.5% to $4.50/M. — Real-time audio and text interactions are now significantly more affordable for high-frequency live applications.
- 2026-05-24: Gemini 2.5 Flash Preview TTS rates doubled across input, output, and batch fields. — Users of the text-to-speech preview will see a 100% increase in operational costs for audio generation.
How buyers think about Google pricing
Each scenario below is interactive — tweak the inputs to see how the math changes for your workload.
Cheapest Gemini tier for high-volume tasks
The problem: You need to process millions of simple classification or extraction tasks without the bill scaling faster than your revenue. High-performance models are overkill for basic data cleaning or routing.
What to do: Use Gemini 2.5 Flash-Lite for the lowest possible entry point or Gemini 3.1 Flash-Lite for newer architecture at a slight premium.
→ Gemini 2.5 Flash-Lite provides a baseline cost of $0.50 per million balanced tokens (as of 2026-05-26).
How far the free AI Studio tier actually goes
The problem: You want to prototype without entering credit card details or committing to a Google Cloud project. You need to know when the rate limits will force a migration to a paid plan.
What to do: Leverage the AI Studio free tier for development and testing before switching to Vertex AI for production scaling.
→ Prototyping is free, but production-ready stability on Gemini 3 Flash starts at approximately $0.55 per 600K tokens (as of 2026-05-26).
Using the 1M context window without breaking budget
The problem: Running RAG-heavy workflows or analyzing massive documents can lead to massive input costs if you send the same 1-million-token context with every query.
What to do: Utilize Vertex AI Context Caching to reduce the cost of repetitive input tokens by 90%.
→ Context caching on Gemini 3.1 Pro reduces repetitive input costs from $2.00 to $0.20 per million tokens (as of 2026-05-26).
Vertex AI vs AI Studio when does each make sense
The problem: You are choosing between the developer-friendly AI Studio and the enterprise-grade Vertex AI platform. You need to know if the extra features justify the potential complexity.
What to do: Choose Vertex AI for workloads that require Google Cloud Enterprise Discount Programs (EDP) or Provisioned Throughput.
→ Vertex AI is the only path to stackable discounts that can reduce token costs by up to 40% (as of 2026-05-26).
When Gemini 3.1 Pro is worth the price over 2.5 Pro
The problem: You need to decide if the intelligence gains in the 3.1 series justify the higher price point compared to the 2.5 series for complex reasoning tasks.
What to do: Use Gemini 2.5 Pro for standard high-intelligence tasks and reserve 3.1 Pro for multi-step reasoning that fails on older models.
→ Gemini 3.1 Pro carries a $2.75 premium per million balanced tokens over Gemini 2.5 Pro (as of 2026-05-26).
Gemini Advanced inside Google Workspace
The problem: You want to provide AI tools to your employees but are unsure whether to buy individual Gemini Advanced subscriptions or use the Workspace AI add-on.
What to do: Compare the cost of the Workspace AI add-on against standalone API usage for internal productivity tools.
→ High-volume internal users may find Workspace seat pricing more predictable than variable API token billing (as of 2026-05-26).
Volume discounts & partner programs
Google Cloud Partner Network (2026 Tier Structure)
Threshold: Select ($250k ACV); Premier ($2M ACV); Diamond ($20M ACV)
Typical discount (reported): 8–12% typical reseller margin; up to 15-20% for $5M/3yr commits
Benefits:
- Diamond tier requires 200 professional certifications and 20 implemented workloads
- Access to Partner Marketing Studio and co-marketing funds
- Dedicated Google Account Manager for larger partners
- Technical training, workshops, and proof-of-concept (PoC) support
- AI-driven automated tracking for tier and competency achievements
How to engage: Apply via Google Cloud Partner Advantage portal; transition window for 2026 program began in Q1 2026
Source: crn.com.aucommunity · cited 2026-01-22
Vertex AI Provisioned Throughput (PT)
Threshold: Measured in Generative AI Scale Units (GSUs); minimums vary by model
Typical discount (reported): Fixed-cost subscription; break-even typically 12-15% of capacity sustained
Benefits:
- Guaranteed throughput and predictable latency for Gemini 3 and Gemini 2.0 models
- Commitment terms: 1 week, 1 month, 3 months, or 1 year
- Ability to schedule PT change orders up to two weeks in advance
- Integration with context caching for further cost reductions
- Supports first-party (Gemini, Veo 3) and third-party models (Anthropic in private preview)
How to engage: Purchase via Provisioned Throughput dashboard in Vertex AI console
Source: cloud.google.comvendor_official · cited 2026-02-19
Google Cloud Enterprise Discount Program (EDP) / CASC
Threshold: Typically $150k+ for custom pricing; $1M-$3M for standard EDP tiers
Typical discount (reported): 35-40% initial offer; reportedly up to high 80% for $1B+ contracts
Benefits:
- Stackable with Committed Use Discounts (CUDs)
- Vertex AI consumption counts toward Customer Annual Spend Commitment (CASC)
- Negotiable 'Vertex-specific discount sleeves' within broader cloud contracts
- Price protection for 3-5 years subject to specific commitments
How to engage: Direct negotiation with Google Cloud sales account teams
Source: magicmag.aianalyst_report · cited 2026-02-18
Azure AI Foundry Provisioned Throughput Reservations
Threshold: Purchased in Provisioned Throughput Units (PTUs)
Typical discount (reported): Up to 70% compared to hourly pay-as-you-go
Benefits:
- 1-month or 1-year reservation terms
- Guaranteed capacity for high-throughput applications
- Model-independent quota requests (apply PTUs across diverse model portfolio)
- Self-service provisioning via Azure Portal
How to engage: Navigate to Reservations section in Azure Portal
Source: techcommunity.microsoft.comvendor_official · cited 2025-05-19
Amazon Bedrock Provisioned Throughput
Threshold: Minimum 1 model unit (MU) commitment
Typical discount (reported): Varies by term; 6-month commitments offer deepest discounts
Benefits:
- 1-month or 6-month commitment options
- Guaranteed throughput for foundation models (Titan, Anthropic, Meta Llama)
- Required for running custom fine-tuned models
- No-commitment hourly option available for maximum flexibility
How to engage: Purchase through AWS Bedrock console under 'Provisioned Throughput'
Source: cloudforecast.ioanalyst_report · cited 2024-10-31
Vertex AI Batch Prediction & Caching Discounts
Threshold: Workload-based (non-real-time)
Typical discount (reported): 50% off for Batch API; 90% off for Context Caching
Benefits:
- Batch Prediction: 50% discount on on-demand rates for supported models
- Context Caching: 90% discount on input tokens for repetitive RAG contexts
- Reduces 'bill-shock' for high-volume summarization or classification tasks
How to engage: Enable via Vertex AI API parameters (e.g., setting 'caching' or using Batch Prediction jobs)
Source: cloudzero.comanalyst_report · cited 2026-05-04
Multi-cloud availability
| Cloud | Model availability | Price vs vendor-direct | Reasons to pick |
|---|---|---|---|
| AWS Bedrock | Gemma 3 (4B, 12B, 27B) and Gemma 4 (via Hugging Face collection) | varies by deployment (serverless pay-as-you-go or marketplace subscription) |
aws.amazon.com ↗ |
| Azure AI Foundry | Gemma 4 (variants including E2B, E4B, 26B A4B, 31B) | reportedly based on managed compute (VM/GPU hourly rates) or serverless pay-as-you-go |
techcommunity.microsoft.com ↗ |
| Together.ai | Gemma 3 27B | reportedly $0.06 per 1M input tokens and $0.12 per 1M output tokens |
computeprices.com ↗ |
| Anyscale | Gemma 7B (via LiteLLM/Anyscale endpoints) | approximately $0.15 per 1M input and $0.15 per 1M output tokens |
litellm.ai ↗ |
| Google Vertex AI (Vendor-Direct) | Gemini 3.1 Pro, Gemini 3.5 Flash, Gemini 2.5 Pro/Flash, Gemma 4 | baseline (e.g., Gemini 3.1 Pro at $2.00/$12.00 per 1M tokens for context <= 200K) |
cloud.google.com ↗ |
Free credits & startup programs
Google for Startups Cloud Program - AI Track
Reported value: up to $350,000 in credits over 2 years
Eligibility: AI-first startups from Seed to Series A (Series A must be raised within the last 12 months); founded within the last 10 years; not received more than $5,000 in previous Google Cloud credits.
How to apply: Apply through the Google for Startups website or via an approved partner (VC, accelerator, or incubator).
Google for Startups Cloud Program - Scale Tier
Reported value: up to $200,000 in credits over 2 years
Eligibility: Startups with verified equity funding from pre-seed to Series A (Series A raised within last 12 months); founded within the last 10 years; not received more than $5,000 in previous Google Cloud credits.
How to apply: Submit application on the Google for Startups Cloud Program page; requires verification of institutional funding.
Google for Startups Cloud Program - Start Tier
Reported value: $2,000 in credits for 12 months
Eligibility: Technology startups not yet funded by an institutional investor; founded within the last 5 years; not received previous credits beyond the free trial.
How to apply: Apply directly on the Google for Startups website.
Google Cloud Research Credits (Faculty & Postdocs)
Reported value: up to $5,000 in credits
Eligibility: Faculty and postdoctoral researchers at higher education institutions in eligible countries; requires a research proposal and cost estimate.
How to apply: Submit an online application form including a research proposal and Google Cloud billing account details.
Google Cloud Research Credits (PhD Students)
Reported value: up to $1,000 in credits per year
Eligibility: PhD students conducting research at educational institutions; must be used for the described project and not personal use.
How to apply: Apply via the Google for Education research credits application form; can apply once per year.
NVIDIA Inception & Google Cloud Collaboration
Reported value: up to $350,000 in Google Cloud credits
Eligibility: Qualified members of the NVIDIA Inception program focused on AI.
How to apply: Members of NVIDIA Inception can access an accelerated path to the Google for Startups Cloud Program through the NVIDIA member portal.
Y Combinator Summer Grants 2026
Reported value: $90,000 in compute credits (shared across AWS, Azure, and GCP)
Eligibility: Technical college students building AI or technical projects full-time in San Francisco during Summer 2026.
How to apply: Apply via the Y Combinator Summer Grants application page; rolling admissions.
Google for Startups Accelerator: AI First
Reported value: equity-free support and free Cloud TPU access
Eligibility: Seed to Series A AI-first startups based in North America; must commit CTO or technical leads to program sessions.
How to apply: Apply for specific cohorts via the Google for Startups Accelerator program page.
Pricing gotchas to watch
Explicit Cache Deletion Billing Trap
When using explicit context caching, users are reportedly billed for the originally specified Time to Live (TTL) duration even if the cache is manually deleted before it expires. For example, a cache created with a 1-hour TTL that is deleted after 15 minutes still incurs the full 1-hour storage charge.
Workaround: Instead of deleting the cache, update the TTL to a shorter duration (e.g., 30 minutes) to adjust the billing period before the cache naturally expires.
Source: developers.google.comvendor_docs · cited 2026-05-26
Implicit Caching Sparse-Traffic Loss
Implicit caching (automatically enabled for Gemini 2.5+) offers no cost-saving guarantee. It is described as an ephemeral optimization layer where data is retained only for a 'defined short retention period' or session lifetime. Sparse traffic patterns often result in cache misses, forcing production users to pay full price for repeated input tokens.
Workaround: For production workloads requiring guaranteed savings, use explicit caching which allows manual TTL management, provided the prompt meets the minimum threshold (typically 1,024 to 4,096 tokens depending on the model).
Source: blog.googleblog_post · cited 2026-05-26
Surprise 'Ghost' Charges from Idle Endpoints
Vertex AI online prediction endpoints charge hourly fees (starting at approximately $0.75 per node-hour) even when idle. Production users have reported 'ghost' charges reaching hundreds of dollars because undeploying a model is reportedly insufficient; the endpoint resource itself must be deleted to stop GPU/compute allocation billing.
Workaround: Implement automated scripts to delete idle endpoints rather than just undeploying models, and use Vertex AI Batch API for non-real-time tasks to avoid 24/7 endpoint costs.
Source: cloud.google.comvendor_docs · cited 2026-05-26
Multimodal Token Counting Discrepancies
Gemini 2.0 image tokenization uses a tiling logic where images larger than 384px are scaled into 768x768 tiles, each costing 258 tokens. However, production users report surprises where a single high-resolution image (e.g., 1920x1080) can result in over 1,800 tokens, significantly higher than a simple 4-tile calculation would suggest.
Workaround: Pre-process and downscale images to 384x384 pixels before sending them to the API to ensure they stay within the minimum 258-token billing tier.
Source: developers.google.comvendor_docs · cited 2026-05-26
Regional Pricing and Language Variance
Vertex AI pricing for Gemini models reportedly varies by region, with non-US regions typically costing 2% to 5% more than us-central1. Additionally, because Vertex AI bills per character rather than per token, Japanese-language deployments can be approximately 3x cheaper per token than English due to higher information density per character.
Workaround: Calculate costs based on character counts for the specific target language rather than token estimates to avoid budget overruns in Latin-script languages.
Source: cloud.google.comvendor_docs · cited 2026-05-26
Legacy Key Auto-Upgrade Billing Risk
A major security-related pricing gotcha involves legacy Google Maps API keys. If these keys reside in a project where Gemini is enabled, they are reportedly 'silently' upgraded to allow Gemini API access. Attackers exploiting these publicly exposed keys have generated unauthorized bills ranging from $10,000 to over $180,000 in a few days.
Workaround: Explicitly restrict all API keys to specific services (e.g., only Maps) and set project-level spend caps, as budget alerts do not stop usage.
Source: trufflesecurity.comblog_post · cited 2026-05-26
Hidden costs (25-40% beyond per-token rates)
- Idle Vertex AI endpoints charge approximately $0.75 per node-hour even when no requests are processed.
- Multimodal tiling for high-resolution images can inflate token counts to 1,800+ tokens per image.
- Regional pricing variance adds 2-5% to the base rate for deployments outside of us-central1.
- Explicit context caching bills for the full TTL duration even if the cache is deleted early.
- Retry overhead from rate limits or network timeouts adds 5-15% to effective monthly costs.
- Japanese and other high-density languages can be 3x cheaper per token due to character-based billing.
- Unauthorized usage from unrestricted legacy API keys can lead to massive surprise bills.
Typical overhead: 25-40% beyond raw per-token rates.
What it costs to leave Google
Migrating away from Google involves moving out of the Vertex AI ecosystem and potentially losing access to the 2-million-token context window. While Gemma models offer an open-weight path for portability, the native grounding integrations with Google Search and BigQuery create significant functional lock-in.
- small project (1-5 prompts): 1-2 engineer-days
- mid-size (10-50 prompts): 1-3 engineer-weeks
- large agentic system: 2-4 engineer-months
Who is this for?
For vibe coders & solo devs
For rapid prototyping, AI Studio is your best friend because it bypasses the complexity of Google Cloud project setup. You should start with Gemini 3.1 Flash-Lite to keep costs extremely low while benefiting from the latest architecture. If you hit rate limits, moving to Vertex AI is straightforward but requires managing service accounts. Focus on using the free tier for dev and only pay for production traffic.* Use AI Studio for zero-config API keys.
* Start with Gemini 3.1 Flash-Lite for $0.25/M input tokens.
* Monitor your usage to avoid the 'Legacy Key' billing trap.
* Switch to Vertex AI only when you need enterprise scaling or logging.
For SMBs and growing teams
Small businesses should look at the 50% discount offered by the Vertex AI Batch API for non-real-time tasks like content generation or report summarization. If you have repetitive tasks, context caching is mandatory to avoid paying for the same data twice. Consider the Google for Startups Cloud Program if you are eligible, as it can provide up to $200,000 in credits. This can effectively eliminate your AI spend for the first two years.* Apply for the Google for Startups Scale Tier for $200,000 in credits.
* Use Batch API to save 50% on high-volume summarization.
* Enable context caching for any prompt over 1,024 tokens.
* Set strict project-level spend caps to prevent budget overruns.
For enterprise buyers
Enterprises should leverage the Enterprise Discount Program (EDP) to ensure Vertex AI spend counts toward broader cloud commitments. If you require guaranteed latency, Provisioned Throughput (PT) is the preferred route, though it requires a commitment of at least one week. For global deployments, be aware that regional pricing can increase costs by 2-5% outside of the US. Use Diamond tier partner benefits if your ACV exceeds $20M for maximum support.* Negotiate 'Vertex-specific discount sleeves' within your EDP.
* Use Provisioned Throughput for mission-critical, high-traffic apps.
* Deploy in us-central1 to avoid the 2-5% regional price premium.
* Delete idle endpoints entirely to avoid the $0.75 per node-hour 'ghost' charge.
Sources verified for this page
Primary: Google pricing page
View all 25 cited insider sources across 16 domains
- Google Cloud Partner Network (2026 Tier Structure) (community, verified 2026-01-22)
- Vertex AI Provisioned Throughput (PT) (vendor_official, verified 2026-02-19)
- Google Cloud Enterprise Discount Program (EDP) / CASC (analyst_report, verified 2026-02-18)
- Azure AI Foundry Provisioned Throughput Reservations (vendor_official, verified 2025-05-19)
- Amazon Bedrock Provisioned Throughput (analyst_report, verified 2024-10-31)
- Vertex AI Batch Prediction & Caching Discounts (analyst_report, verified 2026-05-04)
- Explicit Cache Deletion Billing Trap (vendor_docs, verified 2026-05-26)
- Implicit Caching Sparse-Traffic Loss (blog_post, verified 2026-05-26)
- Surprise 'Ghost' Charges from Idle Endpoints (vendor_docs, verified 2026-05-26)
- Multimodal Token Counting Discrepancies (vendor_docs, verified 2026-05-26)
- Regional Pricing and Language Variance (vendor_docs, verified 2026-05-26)
- Legacy Key Auto-Upgrade Billing Risk (blog_post, verified 2026-05-26)
- AWS Bedrock (grounded_research, verified 2026-05-26)
- Azure AI Foundry (grounded_research, verified 2026-05-26)
- Together.ai (grounded_research, verified 2026-05-26)
- Anyscale (grounded_research, verified 2026-05-26)
- Google Vertex AI (Vendor-Direct) (grounded_research, verified 2026-05-26)
- Google for Startups Cloud Program - AI Track (grounded_research, verified 2026-05-26)
- Google for Startups Cloud Program - Scale Tier (grounded_research, verified 2026-05-26)
- Google for Startups Cloud Program - Start Tier (grounded_research, verified 2026-05-26)
- Google Cloud Research Credits (Faculty & Postdocs) (grounded_research, verified 2026-05-26)
- Google Cloud Research Credits (PhD Students) (grounded_research, verified 2026-05-26)
- NVIDIA Inception & Google Cloud Collaboration (grounded_research, verified 2026-05-26)
- Y Combinator Summer Grants 2026 (grounded_research, verified 2026-05-26)
- Google for Startups Accelerator: AI First (grounded_research, verified 2026-05-26)
Generator: gen-v5.0.8-2026-05-25 · Last refreshed: Mon May 25 2026 21:00:26 GMT-0400 (Eastern Daylight Time) · Pricing snapshot: Mon May 25 2026 00:00:00 GMT-0400 (Eastern Daylight Time)