llms.txt
In one line
llms.txt is a proposed text file placed at the site root that tells large language models where the most important content lives — think 'sitemap, but written for LLMs'.
Going deeper
llms.txt is a markdown file placed at the site root (`/llms.txt`) that tells LLMs where the most important pages, documents and summaries live. It is an informal proposal that Jeremy Howard at Answer.AI floated in September 2024 to fill a gap. robots.txt says 'come or stay out', sitemap.xml says 'these URLs exist', but neither answers the question 'what should an LLM read first to understand this site?'.
The mechanic is light. `/llms.txt` carries a one-paragraph site overview plus a curated list of links in markdown. A companion `/llms-full.txt` can hold the longer body content. The idea is that an LLM meeting the site for the first time gets a fast, structured context map and a clear shortlist of canonical pages to lean on when composing answers.
In day-to-day GEO work, llms.txt sits in the 'high-leverage, low-cost' bucket. It is not mandatory, but it is cheap and the upside on definition accuracy is real. Useful KPIs are pre-versus-post brand definition correctness and category-prompt presence. Villion ships an llms.txt template plus a definition-sentence audit so the file is a real briefing rather than a copy-paste.
It complements rather than replaces robots.txt and sitemap.xml. Robots controls bot access, sitemap catalogues URLs, llms.txt summarises canonical context. They serve different purposes. It is also worth knowing that LLMs do not all honour llms.txt the same way — some read it, some ignore it — so treat it as a hint, not a contract.
The most common misread is that llms.txt alone earns citations. It does not. The file is a briefing, and it only pays off when the underlying content quality, structured data and external authority signals are already in place. Another trap is thinking llms.txt lets you skip robots.txt — they have different jobs, and when policies conflict, bots typically defer to robots.txt.
Sensible setup: ship `/llms.txt` with a one-line brand description and five to fifteen canonical links, add `/llms-full.txt` if you need longer body context, push your standardised definition sentences into the file so the briefing is consistent across surfaces, and run the same brand prompt across ChatGPT, Perplexity and Claude before and after to see whether answers move. Treat llms.txt as a polite note to the model — useful, not magical.
Sources
Related terms
GPTBot
GPTBot is OpenAI's official web crawler used for ChatGPT training and search indexing — controllable via robots.txt.
GEO·AEOClaudeBot
ClaudeBot is Anthropic's web crawler used for training Claude and grounding its answers — manageable via robots.txt.
GEO·AEOPerplexityBot
PerplexityBot is the web crawler Perplexity uses to gather sources for its answer engine — controllable separately via robots.txt.
GEO·AEOGoogle-Extended
Google-Extended is the separate user agent Google uses for training Gemini and Vertex AI, letting site owners control AI training access independently from regular search indexing.
SEOSchema.org
Schema.org is the shared vocabulary co-sponsored by Google, Microsoft, Yahoo and Yandex that lets you label what each page means so search engines and AI can understand it.
How does your brand show up in AI answers?
Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.
Get a free audit