LLMEvaluation & SafetyUpdated 2026.04.28

Jailbreak

Also known asLLM 탈옥Safety Bypass

In one line

A jailbreak is a prompt-level trick that bypasses an LLM's safety restrictions to force it into producing content the model is supposed to refuse.

Going deeper

A jailbreak is a carefully crafted prompt that bypasses a model's safety guardrails — pushing it to produce things it normally refuses, like instructions for violence, hacking or other restricted content. Classics include 'DAN (Do Anything Now)', persona role-play and multi-step indirection.

Marketers rarely write jailbreaks themselves, but their products inherit the risk. If someone bypasses your system prompt and pulls inappropriate output from a branded assistant, that is a direct brand-safety incident. AI safety is no longer just the model vendor's problem.

Defences usually involve multiple layers: input/output filters, a separate policy-checking model, audit logs, periodic red-teaming and human review on high-risk actions. A single system prompt is not enough on its own.

Related terms

How does your brand show up in AI answers?

Get a free audit