LLMTraining & AlignmentUpdated 2026.04.28

Model Distillation

Also known as지식 증류Knowledge DistillationTeacher-Student

In one line

Model distillation trains a small 'student' model to imitate the outputs of a large 'teacher' model — the standard way to move expensive-model quality into a cheaper one.

Going deeper

Model distillation trains a small 'student' model on the outputs of a large 'teacher' model — basically teaching the small one to imitate the big one's judgements. Most lightweight lines (GPT-4o-mini-class, Claude Haiku, smaller Llama variants) are produced this way, in part or in whole.

For marketers running cost-sensitive workloads, distillation is the most practical way to bring AI bills under control. Live chatbots, real-time recommendations and large-scale content classification blow up if you use full-size models for everything. The pattern of 'distilled small model for the base, escalate hard cases to a big model' is becoming standard.

Caveat: a distilled model is not guaranteed to match its teacher. Students often inherit the teacher's weaknesses and hallucination patterns, and the gap widens on out-of-domain queries. Build a real eval set and compare per use case before committing.

Related terms

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit