Symptom-driven reference for the aicost.ai crawler stack. Captures lessons from the 2026-05-01 production-hardening session. Use this when something looks wrong on /crawler-health or after a crawler run.
docs/AICOST-OPERATIONS-COMPLETE.md in the repo.
Token API new models โ auto-incorporated. Crawler writes to aicost_pending_models and exposes them via getAllModels() on next server restart. Calculators show them with autoDetected:true. Manual verification recommended but not required:
SELECT vendor_slug, model_slug, detection_count, last_detected_at FROM aicost_pending_models WHERE verification_status = 'auto' ORDER BY first_detected_at DESC;
Hybrid new plans (added 2026-05-03) โ auto-detected via the hybrid crawler, queued in aicost_pending_hybrid_plans. Plans do NOT auto-expose to consumers โ manual promotion required (audience tagging, capability flags, INFERRED price judgment). Plans flagged needs_inferred_review=1 when seat is null or extraction confidence is low:
SELECT vendor_slug, plan_slug, plan_name, category,
extracted_seat_price_monthly, extraction_confidence,
needs_inferred_review, inferred_reason, detection_count
FROM aicost_pending_hybrid_plans WHERE verification_status = 'auto'
ORDER BY needs_inferred_review DESC, first_detected_at DESC;
Promotion path: Move entry into lib/pricing-data.js (token) or one of the lib/hybrid-pricing-data*.js files (hybrid), then UPDATE ... SET verification_status='verified'. Reject hallucinations with 'rejected' + rejection_reason. Either action drops from cache/queue at next crawler run.
/crawler-healthnode scripts\aicost-changelog-triage.js --since=24hlib/pricing-data.js with new values โ this is the editorial gate (see "How pricing data flows" below)docs/MICROSOFT-MANUAL-UPDATES.mdlast_verified in lib/hybrid-pricing-data-verified-2026q2.js AND in routes/crawler-health.js MANUAL_VENDORS arraynode scripts\generate-vendor-pricing-guides-batch.js --vendor=microsoft --forceThis is the most important architectural fact about aicost.ai.
lib/pricing-data.js is not crawler output. It's the human-curated source of truth โ what aicost.ai claims is true about vendor prices. The crawler reads this file for diff comparison but never writes to it.
Vendor page โโCFโโโถ Mistral extracts JSON โโdiff vsโโโถ lib/pricing-data.js
โ
โ (read-only)
โผ
aicost_price_changelog
โ
[HUMAN REVIEWS] โโโ triage script
โ
โผ
Operator edits
lib/pricing-data.js
โ
โผ
V2 guide regen
โ
โผ
User-facing pages
lib/pricing-data.js is preceded by triage-approved changelog rowsIf the dashboard shows Track 1 as "stale (30+ days)", it means YOU haven't updated lib/pricing-data.js in a month. It does NOT mean the crawler failed. The fix is editorial:
node scripts\aicost-changelog-triage.js --since=30dlib/pricing-data.js for confirmed changesThis is the moat. Other AI cost comparison sites ship whatever an LLM extracts. aicost.ai has an editorial gate.
| Track | What it does | Writes to | Cadence | Run command |
|---|---|---|---|---|
| 1. Token API | Per-token rates for LLM, embedding, TTS APIs | lib/pricing-data.js (static file) |
Nightly | node scripts\aicost-pricing-crawler.js |
| 2. Product | Subscription tiers (Pro/Team/Enterprise) | aicost_product_pricing + aicost_crawler_runs with product:* prefix |
Weekly | node scripts\aicost-product-crawler.js |
| 3. Hybrid | Seat + overage plans (Anthropic Team, GitHub Copilot, etc.) | aicost_price_changelog with hybrid:* prefix |
Weekly Mon 5 AM | node lib\aicost-hybrid-crawler.js |
| 4. Manual | Microsoft + future anti-bot blocked vendors | lib/hybrid-pricing-data-verified-2026q2.js |
Quarterly audit | Edit file directly, see procedure doc |
| 5. Benchmark | Mechanism facts: cache rates, batch discounts, latencies | aicost_benchmarks table |
Weekly | node scripts\aicost-benchmark-crawler.js |
After any crawler run, the triage script classifies all changes into 4 buckets:
node scripts\aicost-changelog-triage.js --since=1h node scripts\aicost-changelog-triage.js --since=24h --vendor=google
| Bucket | Meaning | Action |
|---|---|---|
| ๐ซ SUSPICIOUS | >200% change โ almost certainly extraction error | DO NOT ship. Investigate prompt rules + sanity checks. |
| ๐ก REVIEW | 30-200% change โ could be real (price war) or noise | Verify against vendor page in browser. If real โ ship. If wrong โ revert. |
| โ SAFE | โค30% change โ typical price drift | Ship โ regenerate V2 for affected vendor. |
| ๐ NEW | First time we've seen this slug | Verify it's a real new model (vendor announcement). If yes, add to expected_models in pricing-data.js. |
cachedInput: 0.05 โ 1 (+1900%) โ uniform $1.00 or $4.50 across multiple Google modelsaicost-pricing-crawler.js auto-catches this โ if it's not, file timestamp may be older than patch:
findstr "cached_input_price >= input_price" D:\aicost\scripts\aicost-pricing-crawler.js :: Should return โฅ1 line. If empty, the patched crawler isn't deployed. dir D:\aicost\scripts\aicost-pricing-crawler.js :: Should be timestamped 2026-05-01 or later.
:: Surgical revert: any cachedInput row from this run with >200% change node scripts\aicost-revert-suspicious.js --field=cachedInput --pct-min=200 --since=2h --dry-run node scripts\aicost-revert-suspicious.js --field=cachedInput --pct-min=200 --since=2h :: (Removes --dry-run to apply. Backup auto-saved to pricing-data.js.backup-<timestamp>) :: Then re-run google with patched extractor node scripts\aicost-pricing-crawler.js --vendor=google :: Verify clean: changelog should have 0 suspicious rows for this run node scripts\aicost-changelog-triage.js --since=10m --vendor=google
๐ openai/gpt-5-5 โ NEW MODEL DETECTEDexpected_models in scripts/aicost-pricing-crawler.js for that vendor's config block.| Mistral output | Real slug |
|---|---|
anthropic-pro | anthropic-claude-pro |
openai-plus | openai-chatgpt-plus |
mistral-enterprise | mistral-le-chat-enterprise |
veo-3-1-generate-preview (dashes) | veo-3.1-generate-preview (dots) |
โ ๏ธ google/imagen-4.0-fast-generate-001 โ expected but not extractedexpected_models in crawler config to new name.expected_models.scripts\aicost-pricing-crawler.js EXTRACTION_PROMPT and add example/hint for that model class./crawler-health TLDR lists vendor as Never runpowershell -ExecutionPolicy Bypass -File D:\aicost\powershell\Run-AicostCrawler.ps1HYBRID_VENDORS / PRODUCT_VENDORS arrays match actual data:: Step 1: confirm scheduled tasks exist schtasks /query /tn "aicost*" :: Step 2: trigger one manually to test schtasks /run /tn "aicost-pricing-crawler" :: Step 3: check actual changelog content mysql -u root -p toolsinfo -e "SELECT vendor_slug, MAX(started_at) FROM aicost_crawler_runs GROUP BY vendor_slug ORDER BY 2 DESC LIMIT 20;"
! stale badgepowershell "Get-WinEvent -LogName 'Microsoft-Windows-TaskScheduler/Operational' -MaxEvents 50 | Where-Object {$_.Message -like '*aicost*'} | Format-Table TimeCreated, Id, LevelDisplayName"
Mistral extract failed: 401 {"detail":"Unauthorized"}:: Test the key directly curl -H "Authorization: Bearer %MISTRAL_API_KEY%" https://api.mistral.ai/v1/models :: If 401 โ log into Mistral console, check key status / billing :: If new key needed, update D:\aicost\.env then restart pm2: notepad D:\aicost\.env pm2 reload aicostNote: 401 is usually transient (rate-limit window). Retry in 30 min before assuming key is dead.
CF /markdown 422: timeout was reached โ page took too long to renderCF rate limited (429), retry 1/4 in 8s โ hitting CF's request rate cap(no content) โ page rendered empty (CSR-only site)actionTimeout in CF call, OR add the page to JS-wait retry listnetworkidle0 JS-wait. Usually works on second pass.lib/hybrid-pricing-data-verified-2026q2.js with verified_by: 'manual' and inferred_fields._blocker_noteMANUAL_VENDORS array in routes/crawler-health.jsdocs/<VENDOR>-MANUAL-UPDATES.md patterned on Microsoft procedureโ Gemini non-JSON after 2 attempts (length=NNNN) for one vendor in batch:: 1. Check what's in the context for this vendor node scripts\debug-builder-microsoft.js :: (or copy this script and edit vendor slug for other vendors) :: 2. If ctx looks right, try regen alone (often succeeds on retry) node scripts\generate-vendor-pricing-guides-batch.js --vendor=<slug> --force :: 3. If it persists, capture raw Gemini output node scripts\debug-microsoft-gemini.js > D:\aicost\logs\gemini-debug.log 2>&1 :: Look at "Char at position N" in the parse error โ usually a smart quote, em-dash, or unescaped quoteCommon patterns: vendor without api_models needs the V1-bridge defensive fallbacks (these are already deployed in the patched
aicost-guide-builder-v2.js + aicost-guide-prompts-v2.js).
ERROR 1265 (01000): Data truncated for column 'primary_category' on INSERT:: Step 1: see what values are allowed mysql -u root -p toolsinfo -e "SELECT DISTINCT primary_category FROM aicost_guide_vendor_directory ORDER BY 1;" :: Step 2: pick existing value OR extend the ENUM :: To extend, see the pattern in sql/2026-05-01-extend-category-enum.sql :: Current valid values: llm-api, embedding, vector-db, transcription, tts, :: consumer-ide, consumer-image, consumer-search, consumer-video, :: office-ai, sales-ai, developer-assistant
TypeError: r.change_pct.toFixed is not a function in any custom script reading changelogconst [rows] = await pool.execute('SELECT change_pct FROM aicost_price_changelog ...');
rows.forEach(r => {
r.change_pct = r.change_pct == null ? null : Number(r.change_pct);
r.old_value = r.old_value == null ? null : Number(r.old_value);
r.new_value = r.new_value == null ? null : Number(r.new_value);
});
Once the changelog looks clean (triage shows 0 SUSPICIOUS, sane REVIEW values), regenerate V2 guides for affected vendors so users see fresh data:
:: Single vendor (~30s, ~$0.02) node scripts\generate-vendor-pricing-guides-batch.js --vendor=google --force :: Multiple vendors node scripts\generate-vendor-pricing-guides-batch.js --vendor=google --vendor=openai --force :: All 16 V2 vendors (~6 min, ~$0.30) node scripts\generate-vendor-pricing-guides-batch.js --all --force :: Verify the regen landed mysql -u root -p toolsinfo -e "SELECT slug, last_refreshed_at FROM aicost_guides WHERE type='vendor-pricing' AND schema_version='v2' ORDER BY last_refreshed_at DESC LIMIT 5;" :: Smoke-test a persona URL curl -I http://localhost:3231/ai-cost-guides/pricing/google/for-developer
To add a new vendor (e.g., new LLM provider) end-to-end:
scripts/aicost-pricing-crawler.js VENDORS array with vendor entry: slug, name, pricing URLs, expected_modelsINSERT INTO aicost_guide_vendor_directory (vendor_slug, vendor_name, vendor_url, primary_category, ...) VALUES (...);
node scripts\aicost-pricing-crawler.js --vendor=<new-slug> โ verify clean extractionroutes/crawler-health.js โ TOKEN_API_VENDORS if applicablenode scripts\generate-vendor-pricing-guides-batch.js --vendor=<new-slug> --forceWhen a vendor blocks scraping (Salesforce, Adobe, etc.):
lib/hybrid-pricing-data-verified-2026q2.js with verified_by: 'manual' + inferred_fields._blocker_noteDEPRECATED_BASE_SLUGS if it overlaps with base fileMANUAL_VENDORS array in routes/crawler-health.js with cadence + docs pathdocs/MICROSOFT-MANUAL-UPDATES.md โ docs/<VENDOR>-MANUAL-UPDATES.md and customizeaicost_guide_vendor_directory if not alreadynode scripts\generate-vendor-pricing-guides-batch.js --vendor=<slug> --forcenext_dueLast updated 2026-05-01 ยท โ Back to crawler health