Sitemap.xml
In one line
An XML sitemap is a structured file that lists the important URLs of a site so search engines can discover and crawl them efficiently — its impact grows with site size.
Going deeper
Sitemap.xml lists URLs with metadata like `lastmod` and priority. You register it in Search Console or Bing Webmaster Tools, or point to it from robots.txt.
For small sites, Google can usually discover pages without a sitemap. Once you scale into tens of thousands of pages — typical for ecommerce and publishers — discovery latency and crawl budget waste become very real without one.
One discipline matters: only include indexable URLs. Sitemaps polluted with noindex pages, redirects or 4xx URLs lose Google's trust, so periodic clean-up is part of healthy hygiene.
Sources
Related terms
robots.txt
robots.txt is the text file at a site's root that tells search engines and AI crawlers which paths they may or may not crawl — a long-standing web standard.
SEOCrawl Budget
Crawl budget is the amount of resource a search engine is willing to spend crawling a single site over a given period — a critical factor for indexing efficiency on large sites.
SEOCanonical Tag
The canonical tag tells search engines which URL is the master version when multiple URLs serve the same or near-duplicate content.
SEOInternal Linking
Internal linking is the practice of connecting pages within the same domain — and it shapes crawl efficiency, how authority flows between pages, and how users move through the site.
SEOhreflang
hreflang is the tag that tells search engines which version of a page to serve to users by language and region when you have multiple localised URLs.
How does your brand show up in AI answers?
Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.
Get a free audit