Knowledge Distillation
In one line
Knowledge distillation trains a smaller 'student' model to mimic a larger 'teacher' model — preserving most of the quality while drastically cutting cost and latency.
Going deeper
Knowledge distillation trains a small student model to imitate a large teacher model — its outputs, probability distributions, sometimes its embeddings. Unlike plain 0/1 labels, the student gets a much richer signal: 'how confident was the teacher in this answer'.
The appeal in practice is simple economics. Distillation makes 'roughly 80% of the big model's quality at 2 to 3 percent of the cost' a real card to play. OpenAI, Anthropic and Google have all publicly described their small-model lines (Haiku, Mini, Flash) as operating near this design space.
The catch is licensing. Several commercial APIs explicitly prohibit using outputs to train competing models. Before any distillation project, the first question is whether the training data is legitimately yours to train on under the source's terms.
Related terms
Instruction Tuning
Instruction tuning is the fine-tuning step that teaches a base LLM to follow instructions in natural language — the stage that turns 'a model that completes text' into 'a model you can actually ask things'.
LLMTokenization
Tokenization is the preprocessing step that breaks text into the small pieces a model actually consumes — and it directly drives cost, context length and multilingual performance.
LLMLLM-as-a-Judge
LLM-as-a-judge is the practice of using one LLM to grade or compare the answers of another — a standard way to scale evaluation beyond what human labelling can cover.
LLMStreaming Response
Streaming response is the mode where an LLM emits tokens to the client as they are generated, instead of waiting for the full answer to finish.
LLMLLM
A large language model (LLM) is a neural network trained on massive text corpora to understand and generate human language — the engine behind ChatGPT, Claude, Gemini and similar products.
How does your brand show up in AI answers?
Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.
Get a free audit