Bytespider
In one line
Bytespider is the web crawler operated by ByteDance, TikTok's parent — feeding its in-house AI models, search and recommendation systems.
Going deeper
Bytespider is ByteDance's crawler. It is reported to feed both AI training and the search and recommendation systems running across its products. The user-agent string includes 'Bytespider', so robots.txt can target it cleanly.
Marketers are split on this one. A number of global publishers and large sites block it over traffic load and policy concerns, while brands focused on Southeast Asia and Greater China often keep it allowed for exposure reasons.
Public data on the exposure impact is thin, so estimates are doing a lot of the work here. For a global consumer brand the Bytespider decision is worth pulling onto the agenda once; for a Korea-only B2B brand it sits low on the priority list.
Related terms
Meta-ExternalAgent
Meta-ExternalAgent is the user agent Meta uses to crawl the web for its AI products and models — manageable separately via robots.txt.
GEO·AEOGPTBot
GPTBot is OpenAI's official web crawler used for ChatGPT training and search indexing — controllable via robots.txt.
GEO·AEOClaudeBot
ClaudeBot is Anthropic's web crawler used for training Claude and grounding its answers — manageable via robots.txt.
GEO·AEOCCBot
CCBot is the crawler operated by the nonprofit Common Crawl — and the dataset it produces is the starting point for the training data of many LLMs.
GEO·AEOllms.txt
llms.txt is a proposed text file placed at the site root that tells large language models where the most important content lives — think 'sitemap, but written for LLMs'.
How does your brand show up in AI answers?
Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.
Get a free audit