Temperature
In one line
Temperature is a parameter that controls how much randomness the LLM allows when picking the next token — lower values give consistent answers, higher values give more creative ones.
Going deeper
Temperature controls how much risk the model takes when picking the next token. Near 0, you get almost the same answer every time; above 1.0, output gets creative but inconsistent. Half the reason 'why do I get different answers every time' is this single knob.
The right value depends on the use case. Run customer support and FAQ assistants at 0.2–0.4 for consistency; brainstorm marketing copy at 0.8–1.0 for variety.
In practice, recommended values differ per model, and some recent models (certain GPT-5 variants, for example) hard-fix temperature at 1.0 and reject other values. So 'lower equals more accurate' is a useful heuristic, not a guarantee — check the docs.
Related terms
LLM
A large language model (LLM) is a neural network trained on massive text corpora to understand and generate human language — the engine behind ChatGPT, Claude, Gemini and similar products.
LLMToken
A token is the basic unit an LLM reads and writes — usually a word or piece of a word. LLM pricing and context limits are all measured in tokens.
LLMPrompt Engineering
Prompt engineering is the practice of crafting inputs that steer an LLM toward better outputs — a way to dramatically change result quality without retraining the model.
LLMStructured Output
Structured output forces an LLM to reply in a predefined JSON or schema shape instead of free text — essential when you need to plug AI reliably into other systems.
LLMFunction Calling
Function calling is the interface that lets an LLM invoke predefined functions or APIs instead of just replying in natural language — the core mechanism behind AI agents.
How does your brand show up in AI answers?
Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.
Get a free audit