LLMTraining & AlignmentUpdated 2026.04.28

Model Distillation

Also known as지식 증류Knowledge DistillationTeacher-Student

In one line

Model distillation trains a small 'student' model to imitate the outputs of a large 'teacher' model — the standard way to move expensive-model quality into a cheaper one.

Going deeper

Model distillation trains a small 'student' model on the outputs of a large 'teacher' model — basically teaching the small one to imitate the big one's judgements. Most lightweight lines (GPT-4o-mini-class, Claude Haiku, smaller Llama variants) are produced this way, in part or in whole.

For marketers running cost-sensitive workloads, distillation is the most practical way to bring AI bills under control. Live chatbots, real-time recommendations and large-scale content classification blow up if you use full-size models for everything. The pattern of 'distilled small model for the base, escalate hard cases to a big model' is becoming standard.

Caveat: a distilled model is not guaranteed to match its teacher. Students often inherit the teacher's weaknesses and hallucination patterns, and the gap widens on out-of-domain queries. Build a real eval set and compare per use case before committing.

Related terms

How does your brand show up in AI answers?

Get a free audit