LLMTraining & AlignmentUpdated 2026.04.29

RLHF

Reinforcement Learning from Human Feedback

Also known as인간 피드백 강화학습선호 학습

In one line

RLHF (Reinforcement Learning from Human Feedback) trains an LLM using human preference signals so it produces more helpful, safer responses — the recipe behind the leap in ChatGPT-style quality.

Going deeper

RLHF is an alignment technique that uses human judgments to shape model behaviour. People compare two model outputs and pick the better one; that preference data trains a reward model, which is then used in reinforcement learning to nudge the LLM toward the preferred style.

From a marketing angle, RLHF is the step that turned raw GPT-3 into ChatGPT. Same underlying model, very different feel. The 'it actually understands me' quality you sense in ChatGPT comes more from RLHF than from the base model.

Variants like RLAIF (AI feedback instead of human) and DPO (Direct Preference Optimization) are now common, but the goal is the same: align the model with the kind of answers people actually want.

Sources

Related terms

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit