Test-Time Compute
In one line
Test-time compute is the paradigm of spending more inference effort per query to improve accuracy — the shift made tangible by reasoning-focused models like OpenAI o1 and DeepSeek R1.
Going deeper
Test-time compute is the idea that you can make an LLM smarter not only by training it longer, but by letting it think harder at inference time. The model runs extended chain-of-thought internally, generates multiple candidate answers and uses self-checking or self-consistency to pick the best one. OpenAI o1, DeepSeek R1 and Gemini Thinking models are the canonical examples.
Two implications for marketers. First, on questions that genuinely benefit from careful reasoning — complex comparisons, multi-step analysis, precise specs — these models step up noticeably. Second, the per-answer cost and latency step up too. You do not want a reasoning model on every call.
The emerging norm is routing: cheap, fast model for routine queries, reasoning model for the hard ones. From a GEO angle, reasoning models often produce richer, deeper citations on 'why this brand over that brand'-type questions, so it is worth tracking how your brand fares per model rather than treating them as one bucket.
Related terms
Chain-of-Thought
Chain-of-Thought (CoT) prompting asks the LLM to walk through intermediate reasoning steps before giving a final answer — a simple change that meaningfully improves accuracy on harder problems.
LLMModel Routing
Model routing dispatches each query to the most suitable model based on difficulty or category — the de-facto pattern for balancing cost, accuracy and latency in production AI.
LLMSpeculative Decoding
Speculative decoding speeds up inference by letting a small 'draft' model propose several tokens at once and a big model verify them in one shot — a major lever on latency without losing quality.
LLMLLM
A large language model (LLM) is a neural network trained on massive text corpora to understand and generate human language — the engine behind ChatGPT, Claude, Gemini and similar products.
LLMLLM Benchmark
An LLM benchmark is a standardised test used to compare model capabilities — the source of those headline scores you see in every model launch announcement.
How does your brand show up in AI answers?
Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.
Get a free audit