May 9, 2026Insights

How AI Search Actually Works — Seven Years of RAG Evolution and a Unified SEO·GEO Strategy

From the four-stage pipeline ChatGPT uses to find answers, through the foundational papers (Transformer, REALM, DPR, RAG), to today's techniques (HyDE, Self-RAG, GraphRAG, Agentic RAG). Drawing on OpenAI's own materials, we unpack how AI answers work and how to unify SEO and GEO.

Villion

37 min read

How AI Search Actually Works — Seven Years of RAG Evolution and a Unified SEO·GEO Strategy

Key takeaways

ChatGPT Search was made available to all users on December 16, 2024 and auto-triggers web search to compose answers with sources.
Today's AI answers descend directly from the 2017 Transformer and the 2020 REALM, DPR and RAG papers — with quality jumping further from 2024 to 2026 via HyDE, Self-RAG, GraphRAG and Agentic RAG.
OpenAI runs GPTBot (training) and OAI-SearchBot (search indexing) separately. If you want to be cited in search, you must allow OAI-SearchBot.
The strongest signal for AI citation isn't raw backlink count — it's corroboration: multiple independent authoritative sources stating the same thing consistently.
SEO is the foundation of GEO, not its substitute. Don't run the two axes separately — tie them into one operational flow for efficiency.

1. The moment AI search became everyday

December 16, 2024 is the date OpenAI opened ChatGPT Search to every user. No login required, free to use. ChatGPT decides for itself whether a question needs the web and surfaces sources alongside the answer. The Sources button under the response expands the sidebar of cited sites, and any click carries utm_source=chatgpt.com automatically.

Around the same time, Google's AI Overviews established itself as a feature used by hundreds of millions of people daily in the U.S., and Perplexity carved out a market position by refusing to answer without sources at all.

For marketers, the implication is clear. Users no longer compare ten blue links one by one. They decide based on the three or four brands AI shortlisted. Search has moved one layer inside, and inside that layer, "who AI cites" has become the new visibility frontier.

Below we trace how AI finds answers, going all the way back to OpenAI and Google's official docs and the original RAG papers. Knowing the mechanics is what lets you stop running SEO and GEO as separate programs and pull them into one operational flow.

2. How AI answers actually work — seven years of RAG evolution

Most GEO content stops at "AI works through RAG." But RAG didn't drop from the sky in 2020, and ChatGPT Search today doesn't quite move like the 2020 RAG paper. Trace the seven-year arc and you can see where the answers you're looking at came from and where they're heading.

The full flow of how AI answers get built. The user question passes through Embed → Retrieve → Rerank → Generate, and a mapping of which sentence came from which source is built along the way.

2-1. Where it all started — the 2017–2020 academic base

2017 — Transformer (Vaswani et al., NeurIPS 2017)
Attention Is All You Need
Self-attention lets every word attend to every other word at once. GPT, Claude, Gemini and Llama all run on variants of this design.

2020.02 — REALM (Guu et al., ICML 2020)
Retrieval-Augmented Language Model Pre-Training
Pulled retrieval into the pre-training stage. The model learned to re-look up Wikipedia while answering, lifting open-domain QA accuracy by 4–16%.

2020.04 — DPR (Karpukhin et al., EMNLP 2020)
Dense Passage Retrieval for Open-Domain Question Answering
Replaced keyword matching (BM25) with a dual-encoder embedding-based approach. Top-20 accuracy improved by 9–19 percentage points over BM25 — the direct ancestor of today's vector DBs.

2020.05 — RAG (Lewis et al., NeurIPS 2020)
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Tied parametric memory (model weights) and non-parametric memory (external documents) into one inference flow. The paper introduced the name 'RAG' along with the RAG-Sequence and RAG-Token variants.

Spring 2020 is the truer "year zero" of RAG. Transformer had laid the base architecture; REALM dragged retrieval into the training loop; DPR redrew retrieval itself around embeddings; the RAG paper packed all of it into one inference flow. Today's vector databases (Pinecone, Weaviate, Milvus) and RAG frameworks (LangChain) are direct descendants of those four papers.

2-2. The modern four-stage RAG pipeline

The way ChatGPT Search or Google AI Overviews compose answers today compresses into what the OpenAI Cookbook calls the "Search-Ask pattern." In four steps:

Embed: Place the user's question and candidate documents in the same vector space. On the OpenAI side, models like text-embedding-3-large convert text into thousand-dimensional numeric vectors.
Retrieve: Pull document vectors closest to the question vector by cosine similarity. Typically the top 10–20 are taken as the first-pass shortlist.
Rerank: Re-score the shortlist with a more precise model and keep only the top 3–5. Which document gets cited first in the answer is effectively decided here.
Generate: Feed the shortlisted documents as context and let the model compose the answer. A mapping of which sentence came from which document is generated alongside — the citations and sources you see are that mapping surfaced to the user.

The Cookbook makes a point of saying "fine-tuning is actually a poor fit for learning facts." Analogizing model weights to long-term memory and the message context to short-term memory, it recommends RAG when you want accurate answers. ChatGPT Search itself is that principle implemented as product.

2-3. Recent evolution — techniques that pushed answer quality up (2024–2026)

The basic four steps aren't enough. When questions are vague, when multi-hop reasoning is needed, or when information is scattered across documents, plain vector search misses often. Techniques from 2024 onward filled those gaps.

HyDE (Hypothetical Document Embedding)
Instead of searching with the question directly, the model first writes a hypothetical answer and searches using the embedding of that answer. Text closer to the answer turns out to be closer to the documents containing the answer — accuracy on technical Q&A improved noticeably.

Self-RAG (Self-Reflective RAG)
After composing an answer, the model critically re-examines its own response. If evidence is weak or contradictions appear, it runs retrieval again — a self-verification loop embedded inside the answer flow.

GraphRAG (Graph-based RAG, Microsoft Research)
Instead of chunking documents into flat passages, builds them into a knowledge graph. Entities and relations are explicit, enabling multi-hop reasoning. LinkedIn reported MRR up 77.6% and mean resolution time down 28.6% after adopting GraphRAG.

Agentic RAG (Planning + reflection-based RAG)
On receiving a question, the system first plans the steps to the answer. It then retrieves multiple times as needed, calls external tools (SQL, APIs) and revisits intermediate results. Closer to a small agent doing the job than a single retrieval pass.

2-4. What ChatGPT Search really does

OpenAI hasn't published ChatGPT Search's exact internals. But combining observed behavior with the official help docs suggests the techniques above are stitched together. When a question arrives, the model first decides "does this need web search?" If yes, it pulls in content pre-indexed by OAI-SearchBot together with real-time web results and uses multi-step reasoning to assemble the answer. The Sources sidebar surfaces a visualization of which part of the answer came from which document — essentially the same concept Google calls grounding supports in Vertex AI documentation.

3. Which sites ChatGPT cites

OpenAI addresses this directly in its Publishers and Developers FAQ: "Ranking in ChatGPT Search is based on multiple factors to provide users with trustworthy, relevant information. There is no way to guarantee top placement." It's not the kind of system where you reverse-engineer the algorithm to take #1 the way SEO once worked. But the preconditions for being cited are quite clear.

3-1. Understand the three crawlers separately

OpenAI runs three crawlers separately.

GPTBot: training-data collection. Blocking it doesn't affect ChatGPT Search citations.
OAI-SearchBot: ChatGPT Search indexing. Block it and you can't be cited.
ChatGPT-User: explicit real-time fetches triggered by the user.

The point is spelled out in the SearchGPT announcement: "Search is decoupled from training. A site can opt out of training data and still appear in search results." If training-data inclusion bothers you, block GPTBot only and keep OAI-SearchBot open.

3-2. The conditions that actually increase citations

Conditions called out directly in the Publishers FAQ:

Allow OAI-SearchBot to crawl
Don't have your site host or CDN block OpenAI's published IP traffic
OpenAI also respects X-Robots-Tag and meta noindex beyond robots.txt

Signals observed in the field in addition:

External citations from authoritative domains (corroboration)
Structured data such as Schema.org
Clear answer-first writing (answer-first structure)
Freshness (signals validated in communities and news)

What OpenAI repeatedly emphasizes is not raw backlink count but corroboration — a fact becomes a citable signal only when multiple independent authoritative sources state it consistently.

3-3. Traffic is trackable

Every referral from ChatGPT Search automatically carries utm_source=chatgpt.com. Filter on it in Google Analytics or any analytics tool to isolate traffic coming in through ChatGPT citations.

4. Search intent (SEO) and answer intent (GEO) — the same and different

The most fundamental concept in SEO is search intent. Google's Search Quality Rater Guidelines split user intent into four: Know (want to know), Do (want to act), Website (want a specific site), Visit-in-person (want to go somewhere). You may know them better as informational, navigational, commercial and transactional. Ninety-nine percent of searches fall into these four.

The same four intents work for AI search too. What changes is what the answer looks like.

Search intent	Traditional search result	AI answer form
Know (information)	Links to Wikipedia, blog articles	Definition or summary + cited sources
Do (action)	Tutorial and how-to sites	Step-by-step explanation + cited sources
Website	A single official site	Direct domain pointer
Commercial (compare)	Comparison content, review sites	3–5 recommendations + comparison summary
Transactional (buy)	Product pages	A site you can buy from + price

The biggest shift is in Commercial and Know. In classical SEO the formula was "informational content pulls traffic, transactional pages convert." In AI search the recommendation sits inside the informational answer. A user searching for "GEO tool recommendations" gets three or four picked by AI. If your comparison content isn't in the source list of that answer, the brand isn't visible to the user at all.

This is why GEO has to be treated as a separate axis, not just "new SEO." SEO is "visibility to be discovered"; GEO is "visibility to be chosen." The two share the foundation but move very differently at the result stage.

5. Google AI Overviews vs ChatGPT Search

Same RAG principle, quite different implementations.

Google AI Overviews

Uses Gemini-based multi-step reasoning. Even when a question packs in multiple sub-queries, the model splits them itself. Vertex AI's grounding mechanism maps each answer segment to a source chunk — Google calls these grounding chunks and grounding supports. Google has stated publicly that links inside AI Overviews receive more clicks than typical search results.

ChatGPT Search

Uses OAI-SearchBot's pre-indexed content together with real-time web results. The Sources sidebar under the answer makes citations visible. The differentiator from Google is that explicit grounding data isn't exposed to the user — there's no precise visible mapping of which sentence came from which source.

To be cited in both systems at once, you ultimately stand on the same foundation: authoritative domains, Schema.org structured data, answer-first writing, and facts validated consistently across multiple independent sources. Both systems treat corroboration as the strongest signal.

6. Seven SEO·GEO unified actions

With the mechanics laid out, here's what to actually do.

Separate pages by search intent: Don't cram Know, Do and Commercial into a single page. Splitting informational and comparison content lets AI clearly read the page's intent. Stuff everything into one page and you get cited for none of them.
Paragraph-level self-containment: The unit GPT-class models lift into an answer is the paragraph, not the page. So each paragraph needs to make sense on its own with no outside context. Phrases like 'as explained above' or 'as we'll see below' break the meaning the moment the paragraph gets excerpted.
JSON-LD structured data: Schema.org JSON-LD works for SEO and GEO at the same time. Google uses it for Rich Results, and AI search systems use it for entity recognition and fact extraction. At minimum, Article, Organization, Product and FAQ should be in place.
E-E-A-T signals: Experience, Expertise, Authoritativeness, Trustworthiness — the same four count for AI citations. Author info (name, affiliation, credentials), source citations, last-updated dates and external media mentions are the core signals. Anonymous content and unsourced claims aren't trusted by any system.
Crawler policy hygiene: Allow OAI-SearchBot in robots.txt, and verify your CDN doesn't block OpenAI's IP traffic. Blocking GPTBot only affects training opt-out, not search citation. Manage Googlebot, OAI-SearchBot, ClaudeBot and PerplexityBot policies separately.
Multi-source corroboration: Make the same fact appear consistently across multiple independent authoritative sources. When your own site, Wikipedia, industry media, review platforms (G2, Capterra) and organic mentions on Reddit or Stack Overflow all line up, AI treats that fact as citable.
Citability measurement and monitoring: Even with all six above, without measuring which queries cite you in which answers, you can't improve. Measure your brand and competitors on the same query set weekly and track citation rate, citation position and mention context.

7. Extra variables for the Korean market

Korean marketers have two more variables to manage.

First, Naver still carries a large share of search. Naver maintains its own corpora from Knowledge-iN and blog indexes, and AI search systems frequently pull from Naver content when answering Korea-specific questions. Skip Naver SEO and visibility on Korea-related questions in AI answers drops with it.

Second, Korean-language LLM citations have their own tone and vocabulary. Whether you write a foreign proper noun in the source script or in Hangul, whether you use formal honorifics or plain register, whether you cite sources in parentheses or footnotes or inline — each of these choices affects citability. Translating an English article verbatim and writing in Korean from scratch produce very different citation shapes.

Skip these two and a GEO·SEO unified strategy stops at half-effectiveness in the Korean market.

8. Conclusion — SEO is the foundation of GEO

The whole story in one sentence: SEO is the foundation of GEO, not its substitute. The seven-year arc — from the 2017 Transformer through RAG and DPR to ChatGPT Search — never removed search. Search just moved one layer in, from "discovery" into "selection." The same site assets do work at both stages.

Villion is an integrated platform that handles GEO, AEO and SEO in one solution. Diagnosis, content production, site-signal hardening and citation-rate measurement are tied into a single flow. The differentiator is that the seven actions above can be run by both the SEO and GEO sides of the team inside one tool, instead of as parallel programs.

Primary sources

Frequently asked questions

What exactly is RAG?

Retrieval-Augmented Generation (RAG) is the architecture that combines parametric memory (knowledge baked into model weights) with non-parametric memory (documents retrieved on demand). It was named in Patrick Lewis's NeurIPS 2020 paper from Facebook AI Research. ChatGPT Search, Google AI Overviews and Perplexity all run variants of this RAG idea.

If we block OpenAI's GPTBot, do we drop out of ChatGPT Search citations?

Not necessarily. OpenAI runs separate crawlers for training (GPTBot) and search indexing (OAI-SearchBot). Blocking GPTBot only removes you from training data; it does not affect ChatGPT Search citations. To be citable in search, allow OAI-SearchBot to crawl and avoid blocking OpenAI's published IP-range traffic at the CDN. The SearchGPT announcement explicitly notes that search and training are kept separate.

How do we track traffic coming from ChatGPT Search?

Every referral link from ChatGPT Search automatically carries utm_source=chatgpt.com. Filter on that parameter in Google Analytics (or any analytics tool) to isolate traffic from ChatGPT citations. This is the official tracking method per the OpenAI Publishers FAQ.

Do we need a lot of backlinks to be cited in RAG systems?

Corroboration is a much stronger signal than raw backlink count. A fact becomes a citable signal only when multiple independent authoritative sources state it consistently. The most powerful citation signal lines up your own site, Wikipedia, industry media, review platforms (G2, Capterra) and organic mentions on Reddit or Stack Overflow — all pointing in the same direction.

Is an SEO-friendly site automatically GEO-friendly?

The foundation is the same. A few additional layers are needed though: pages separated by search intent, paragraph-level self-containment, JSON-LD structured data, E-E-A-T signals, OAI-SearchBot crawler policy, and multi-source corroboration. Layer those on top and SEO assets become GEO assets. The two axes don't need separate workflows — running them together is far more efficient.

Guides

Back to all articles