GPTBot
In one line
GPTBot is OpenAI's official web crawler used for ChatGPT training and search indexing — controllable via robots.txt.
Going deeper
GPTBot is OpenAI's official web crawler. It powers training data collection for ChatGPT and live indexing for ChatGPT Search. It exists because, before GPTBot was publicly identified, site owners had no clean way to set OpenAI-specific policy. Naming the bot — and publishing the User-Agent string — finally made allow/block decisions tractable.
Mechanically it follows the robots.txt standard. Requests carrying 'GPTBot' in the User-Agent header are governed by 'User-agent: GPTBot' rules, with Allow/Disallow directives controlling directory-level access. Block it and the bot stops crawling those paths. OpenAI also publishes bot IP ranges so site owners can verify and fend off spoofed user agents.
The decision marketers face is allow or block. Some publishers block to protect IP, but most product and brand sites should allow GPTBot — blocking effectively removes you from ChatGPT's training and citation pools, which kills AI search visibility on the largest consumer surface. From a KPI lens, GPTBot policy is the first variable determining whether you are even eligible for citation.
Treat the OpenAI bot family as a unit. Beyond GPTBot, there is OAI-SearchBot (search retrieval) and ChatGPT-User (when a user clicks a link inside ChatGPT). Each has a different role. Set against ClaudeBot, PerplexityBot and Google-Extended, GPTBot deserves top priority simply because ChatGPT carries the largest user base in AI search. Villion auto-audits robots.txt for the full OpenAI fleet and flags missing rules.
Two common misreads. First, blocking GPTBot alone does not necessarily remove you from ChatGPT — OAI-SearchBot or partner search data can still surface your content, so policy has to be set fleet-wide. Second, treating policy as a one-time decision: OpenAI adds and segments bots over time, so re-audit every quarter and confirm coverage on the latest crawler list.
Sensible next steps: review GPTBot, OAI-SearchBot and ChatGPT-User policy in robots.txt as a single bundle, default to allow except for payment, admin and gated paths, validate against published bot IP ranges, and re-check brand definition accuracy in ChatGPT answers each quarter. GPTBot policy is foundational GEO plumbing — fix it first, before anything fancier.
Sources
Related terms
ChatGPT Search
ChatGPT Search is the feature that lets ChatGPT combine its trained knowledge with live web results, citing sources alongside the answer.
GEO·AEOClaudeBot
ClaudeBot is Anthropic's web crawler used for training Claude and grounding its answers — manageable via robots.txt.
GEO·AEOPerplexityBot
PerplexityBot is the web crawler Perplexity uses to gather sources for its answer engine — controllable separately via robots.txt.
GEO·AEOGoogle-Extended
Google-Extended is the separate user agent Google uses for training Gemini and Vertex AI, letting site owners control AI training access independently from regular search indexing.
GEO·AEOllms.txt
llms.txt is a proposed text file placed at the site root that tells large language models where the most important content lives — think 'sitemap, but written for LLMs'.
GEO·AEOBingbot
Bingbot is Microsoft's crawler for Bing and Copilot search indexing — and because Copilot's AI answers ride on the Bing index, it has become a GEO-relevant bot again.
How does your brand show up in AI answers?
Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.
Get a free audit