Jailbreak
In one line
A jailbreak is a prompt-level trick that bypasses an LLM's safety restrictions to force it into producing content the model is supposed to refuse.
Going deeper
A jailbreak is a carefully crafted prompt that bypasses a model's safety guardrails — pushing it to produce things it normally refuses, like instructions for violence, hacking or other restricted content. Classics include 'DAN (Do Anything Now)', persona role-play and multi-step indirection.
Marketers rarely write jailbreaks themselves, but their products inherit the risk. If someone bypasses your system prompt and pulls inappropriate output from a branded assistant, that is a direct brand-safety incident. AI safety is no longer just the model vendor's problem.
Defences usually involve multiple layers: input/output filters, a separate policy-checking model, audit logs, periodic red-teaming and human review on high-risk actions. A single system prompt is not enough on its own.
Related terms
AI Alignment
AI alignment is the field — and the practical work — of making AI systems behave in line with human intent, values and safety constraints.
LLMPrompt Injection
Prompt injection is an attack where instructions hidden in untrusted data override the system prompt and force the LLM into unintended behaviour.
LLMGuardrails
Guardrails are the layer of input/output checks added around an LLM to block unsafe responses, policy violations and leakage of sensitive information.
LLMSystem Prompt
A system prompt is the instruction sent to an LLM before any user message, defining the assistant's role, tone and rules — effectively the AI product's character.
LLMRLHF
RLHF (Reinforcement Learning from Human Feedback) trains an LLM using human preference signals so it produces more helpful, safer responses — the recipe behind the leap in ChatGPT-style quality.
How does your brand show up in AI answers?
Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.
Get a free audit