LLMTraining & AlignmentUpdated 2026.04.28

Knowledge Distillation

Also known as디스틸레이션Model Distillation

In one line

Knowledge distillation trains a smaller 'student' model to mimic a larger 'teacher' model — preserving most of the quality while drastically cutting cost and latency.

Going deeper

Knowledge distillation trains a small student model to imitate a large teacher model — its outputs, probability distributions, sometimes its embeddings. Unlike plain 0/1 labels, the student gets a much richer signal: 'how confident was the teacher in this answer'.

The appeal in practice is simple economics. Distillation makes 'roughly 80% of the big model's quality at 2 to 3 percent of the cost' a real card to play. OpenAI, Anthropic and Google have all publicly described their small-model lines (Haiku, Mini, Flash) as operating near this design space.

The catch is licensing. Several commercial APIs explicitly prohibit using outputs to train competing models. Before any distillation project, the first question is whether the training data is legitimately yours to train on under the source's terms.

Related terms

LLM

Instruction Tuning

Instruction tuning is the fine-tuning step that teaches a base LLM to follow instructions in natural language — the stage that turns 'a model that completes text' into 'a model you can actually ask things'.

LLM

Tokenization

Tokenization is the preprocessing step that breaks text into the small pieces a model actually consumes — and it directly drives cost, context length and multilingual performance.

LLM

LLM-as-a-Judge

LLM-as-a-judge is the practice of using one LLM to grade or compare the answers of another — a standard way to scale evaluation beyond what human labelling can cover.

LLM

Streaming Response

Streaming response is the mode where an LLM emits tokens to the client as they are generated, instead of waiting for the full answer to finish.

LLM

A large language model (LLM) is a neural network trained on massive text corpora to understand and generate human language — the engine behind ChatGPT, Claude, Gemini and similar products.

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit