Model Distillation
In one line
Model distillation trains a small 'student' model to imitate the outputs of a large 'teacher' model — the standard way to move expensive-model quality into a cheaper one.
Going deeper
Model distillation trains a small 'student' model on the outputs of a large 'teacher' model — basically teaching the small one to imitate the big one's judgements. Most lightweight lines (GPT-4o-mini-class, Claude Haiku, smaller Llama variants) are produced this way, in part or in whole.
For marketers running cost-sensitive workloads, distillation is the most practical way to bring AI bills under control. Live chatbots, real-time recommendations and large-scale content classification blow up if you use full-size models for everything. The pattern of 'distilled small model for the base, escalate hard cases to a big model' is becoming standard.
Caveat: a distilled model is not guaranteed to match its teacher. Students often inherit the teacher's weaknesses and hallucination patterns, and the gap widens on out-of-domain queries. Build a real eval set and compare per use case before committing.
Related terms
Fine-tuning
Fine-tuning takes an already pretrained LLM and trains it further on a narrower dataset to specialise it for a domain, task or voice — the most common path for adapting an LLM to your own data.
LLMQuantization
Quantization compresses model weights to lower precision (say, 16-bit down to 4-bit) so the same model fits on smaller GPUs and runs more cheaply.
LLMOpen-weight Model
An open-weight model is an LLM whose weights are publicly released so anyone can download and run it on their own infrastructure — Llama, Mistral and Qwen are the best-known examples.
LLMModel Routing
Model routing dispatches each query to the most suitable model based on difficulty or category — the de-facto pattern for balancing cost, accuracy and latency in production AI.
LLMLLM
A large language model (LLM) is a neural network trained on massive text corpora to understand and generate human language — the engine behind ChatGPT, Claude, Gemini and similar products.