SEOTechnical SEOUpdated 2026.04.28

robots.txt

Also known as로봇 텍스트robots 파일

In one line

robots.txt is the text file at a site's root that tells search engines and AI crawlers which paths they may or may not crawl — a long-standing web standard.

Going deeper

robots.txt has been around since 1994 and was finally standardised as RFC 9309 by the IETF in 2022. It is a plain text file at the site root (`https://example.com/robots.txt`). You target a bot with `User-agent` and control path access with `Allow` / `Disallow`. Every major search crawler — Googlebot, Bingbot — and the new wave of AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended — read it first. In other words, robots.txt is the first gate deciding whether your site shows up in AI answers at all.

Basic syntax: ```User-agent: *\nDisallow: /admin/\nAllow: /\nSitemap: https://example.com/sitemap.xml```. `*` matches every bot, and you can stack per-bot rules (e.g., `User-agent: GPTBot` followed by `Disallow: /`). The `Sitemap:` line is optional but conventional.

The most common misunderstanding is that robots.txt blocks indexing. It does not — **it blocks crawling, not indexing**. A `Disallow`-ed URL with many external links can still appear in the index without its body, showing up on SERPs as `No information is available for this page`, which is worse than not appearing at all. To genuinely keep a page out of the index, add `<meta name="robots" content="noindex">` to the page. Caveat: `noindex` only works if the bot can crawl the page, so blocking with `robots.txt` and `noindex` simultaneously is contradictory.

Two production incidents are worth pre-empting. **First, a staging `Disallow: /` shipped to production** — the entire site vanishes from search. More than half of `traffic dropped to zero after relaunch` postmortems trace back to that single line. **Second, default-on AI crawler blocks at the CDN layer** — Cloudflare and others have shipped `Block AI bots` features that turn on by default, silently locking out GPTBot and ClaudeBot. Unless you have a deliberate policy reason, leaving these defaults on quietly removes you from ChatGPT and Claude answers.

For GEO and AEO, robots.txt is literally the gate to citation eligibility. The bots to check by name: GPTBot (OpenAI), ChatGPT-User (browsing), Google-Extended (Gemini, SGE), ClaudeBot and `anthropic-ai` (Anthropic), PerplexityBot, Applebot-Extended, CCBot (Common Crawl). CCBot in particular underpins the training datasets of nearly every major LLM — blocking it costs you long-term citation presence. Add Search Console's robots.txt tester and a direct check of `https://example.com/robots.txt` to your monthly hygiene routine.

Sources

Related terms

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit