AI AgentSecurity & EvaluationUpdated 2026.04.28

Constitutional AI

Constitutional AI (CAI)

Also known asCAI헌법 기반 AI

In one line

Constitutional AI (CAI) is Anthropic's alignment technique where the model critiques and revises its own answers against a written set of principles — a 'constitution' — instead of relying entirely on human-labeled feedback.

Going deeper

Constitutional AI is the alignment technique Anthropic proposed in 2022 as a complement, and partial alternative, to RLHF. Instead of having humans rate every response, you write down a set of principles — 'be helpful', 'be honest', 'avoid harm' — and train the model to critique and revise its own answers against that constitution.

What marketers actually feel is the resulting answer style. Claude tends to be more cautious than ChatGPT or Gemini and refuses risky requests in a softer tone, and a large part of that comes from Constitutional AI. The same prompt yielding noticeably different replies across models has roots here.

It is not a silver bullet. A poorly written constitution can produce poorly aligned behaviour, and the same principles can be read differently across languages and cultures. In production it is usually layered with RLHF, evaluation systems and human-in-the-loop rather than relied on alone.

Sources

Related terms

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit