LLMTraining & AlignmentUpdated 2026.04.28

Instruction Tuning

Also known as지시문 학습Instruction Fine-tuning

In one line

Instruction tuning is the fine-tuning step that teaches a base LLM to follow instructions in natural language — the stage that turns 'a model that completes text' into 'a model you can actually ask things'.

Going deeper

Instruction tuning takes a base model that just predicts the next token and trains it on instruction-and-response pairs. ChatGPT, Claude and Gemini behave like chatbots largely because of this stage — without it the underlying model just continues your text.

It usually pairs with an alignment step like RLHF (Reinforcement Learning from Human Feedback) or DPO. Instruction tuning teaches the format of 'follow what the user asks'; RLHF and DPO teach 'which of several plausible responses humans actually prefer'.

In B2B, more teams are running their own instruction tuning — often as light LoRA or PEFT — on internal data. It is the right tool when you need consistent handling of domain jargon, internal document style or a specific output format the base model keeps drifting away from.

Related terms

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit