LLMTraining & AlignmentUpdated 2026.04.28

Knowledge Distillation

Also known as디스틸레이션Model Distillation

In one line

Knowledge distillation trains a smaller 'student' model to mimic a larger 'teacher' model — preserving most of the quality while drastically cutting cost and latency.

Going deeper

Knowledge distillation trains a small student model to imitate a large teacher model — its outputs, probability distributions, sometimes its embeddings. Unlike plain 0/1 labels, the student gets a much richer signal: 'how confident was the teacher in this answer'.

The appeal in practice is simple economics. Distillation makes 'roughly 80% of the big model's quality at 2 to 3 percent of the cost' a real card to play. OpenAI, Anthropic and Google have all publicly described their small-model lines (Haiku, Mini, Flash) as operating near this design space.

The catch is licensing. Several commercial APIs explicitly prohibit using outputs to train competing models. Before any distillation project, the first question is whether the training data is legitimately yours to train on under the source's terms.

Related terms

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit