Prompt Caching
In one line
Prompt caching reuses the computation done on a repeated system prompt or document so subsequent calls are dramatically cheaper and faster — a direct lever on operating costs for repetitive workloads like GEO monitoring.
Going deeper
Prompt caching saves the internal state computed for a repeated prompt — system prompt, fixed documents, long context — so that subsequent calls do not have to recompute it. OpenAI, Anthropic and Google each offer their own flavour. On cache hit, input token cost typically drops by 50–90% and time-to-first-token shrinks meaningfully.
For marketers it is a direct lever on AI operating costs. Workloads with heavy repetition — daily GEO monitoring runs that fire the same system prompt hundreds of times, in-house RAG that reuses the same document set every call — are exactly the cases where caching pays off.
Practical caveat: caches usually require the leading portion of the prompt to match byte-for-byte. The standard template is 'system prompt → fixed reference docs → variable user input', in that order, to maximise hit rate.
Sources
Related terms
System Prompt
A system prompt is the instruction sent to an LLM before any user message, defining the assistant's role, tone and rules — effectively the AI product's character.
LLMContext Window
The context window is the maximum number of tokens an LLM can take in at once — it defines how much content the model can consider in a single prompt.
LLMRAG
RAG (Retrieval-Augmented Generation) lets an LLM fetch external documents at answer time and ground its response in them — the technique behind ChatGPT Search, Perplexity and most AI search products.
LLMContext Engineering
Context engineering goes beyond crafting a single prompt — it is the design discipline of deciding which context to assemble and how to feed it to the model, an idea that crystallised in 2024–2025.
LLMLLM
A large language model (LLM) is a neural network trained on massive text corpora to understand and generate human language — the engine behind ChatGPT, Claude, Gemini and similar products.
How does your brand show up in AI answers?
Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.
Get a free audit