LLMTraining & AlignmentUpdated 2026.04.28

Model Distillation

Also known as지식 증류Knowledge DistillationTeacher-Student

In one line

Model distillation trains a small 'student' model to imitate the outputs of a large 'teacher' model — the standard way to move expensive-model quality into a cheaper one.

Going deeper

Model distillation trains a small 'student' model on the outputs of a large 'teacher' model — basically teaching the small one to imitate the big one's judgements. Most lightweight lines (GPT-4o-mini-class, Claude Haiku, smaller Llama variants) are produced this way, in part or in whole.

For marketers running cost-sensitive workloads, distillation is the most practical way to bring AI bills under control. Live chatbots, real-time recommendations and large-scale content classification blow up if you use full-size models for everything. The pattern of 'distilled small model for the base, escalate hard cases to a big model' is becoming standard.

Caveat: a distilled model is not guaranteed to match its teacher. Students often inherit the teacher's weaknesses and hallucination patterns, and the gap widens on out-of-domain queries. Build a real eval set and compare per use case before committing.

Related terms

LLM

Fine-tuning

Fine-tuning takes an already pretrained LLM and trains it further on a narrower dataset to specialise it for a domain, task or voice — the most common path for adapting an LLM to your own data.

LLM

Quantization

Quantization compresses model weights to lower precision (say, 16-bit down to 4-bit) so the same model fits on smaller GPUs and runs more cheaply.

LLM

Open-weight Model

An open-weight model is an LLM whose weights are publicly released so anyone can download and run it on their own infrastructure — Llama, Mistral and Qwen are the best-known examples.

LLM

Model Routing

Model routing dispatches each query to the most suitable model based on difficulty or category — the de-facto pattern for balancing cost, accuracy and latency in production AI.

LLM

A large language model (LLM) is a neural network trained on massive text corpora to understand and generate human language — the engine behind ChatGPT, Claude, Gemini and similar products.

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit