GEO·AEOCrawlers & Bot PolicyUpdated 2026.04.28

Google-Extended

In one line

Google-Extended is the separate user agent Google uses for training Gemini and Vertex AI, letting site owners control AI training access independently from regular search indexing.

Going deeper

Google-Extended is a separate user agent Google created so site owners can govern AI training access (Gemini, Vertex AI) independently from regular search indexing. The motivation is straightforward — as AI training rights became contested, Google needed a way to let publishers say 'index us in search, but do not train AI on us'. Splitting the user agent was the cleanest answer.

Mechanically you control it with a 'User-agent: Google-Extended' rule in robots.txt, using Allow/Disallow directives at directory granularity. Block it and AI-training collection stops, while regular Googlebot continues to crawl for search. That separation is the whole point of the lever — and it is exactly what makes the policy decision tractable for legal and content teams.

For marketers, the practical value is being able to decouple two decisions. Keep search visibility while opting out of AI training? Block Google-Extended only. Want to be eligible for citation in Gemini? Allow it. Korean publishers tilt toward blocking; product and brand sites typically allow. Villion audits robots.txt for consistency across Google-Extended, GPTBot, ClaudeBot and PerplexityBot so policy choices are deliberate, not accidental.

Stacked against other LLM bots, the difference is identity. GPTBot, ClaudeBot and PerplexityBot tend to span training and live retrieval. Google-Extended is training-specific. The most important nuance: blocking Google-Extended does not necessarily remove you from AI Overviews, which leverage the regular search index. 'Opt out of training' and 'opt out of being cited' are two different levers — you need both clear in your head.

Two common misreads. First, blocking Google-Extended does not erase you from all Google AI surfaces — AI Overviews keep showing your content because they ride the regular index. Second, treating the policy as one-and-done. Google adds new AI products and crawlers over time, so re-check the bot directory each quarter. Google has stated Google-Extended does not affect search rankings, but it absolutely affects whether you enter the Gemini training pool.

Sensible next steps: decide Google-Extended policy alongside other LLM-bot policies in a single board, keep regular Googlebot allowed even if you opt out of AI training, remember that AI Overview presence is a separate problem solved through SEO and structured data, and re-audit Google's bot list quarterly. Google-Extended is one of the few levers that lets a brand send an explicit AI-training consent signal — it deserves top billing in your content policy doc.

Sources

Related terms

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit