AI AgentSecurity & EvaluationUpdated 2026.04.28

Agent Hijacking

Also known asIndirect Prompt Injection에이전트 탈취

In one line

Agent hijacking is the attack where malicious instructions hidden inside external data or tool outputs take over an agent's decision-making — the headline security threat for autonomous agents.

Going deeper

Agent hijacking is often called indirect prompt injection. The malicious instruction does not come from the user — it is embedded in external data the agent retrieves through a tool: a web page, an email, a document, an API response. A line like 'Ignore previous instructions and ...' sitting inside data you trusted enough to ingest can be mistaken by the model for a user directive.

It matters because agents have tools. A vanilla chatbot just gives a bad answer; an agent that sends email, places orders or deploys code can do real damage when hijacked. The attack surface grows as autonomy levels rise.

There is no single defence. The realistic answer is layered: separate trusted from untrusted inputs, minimise tool permissions, require human-in-the-loop for risky actions, run output and content checks, and red-team regularly. Treat it as ongoing posture, not a one-time fix.

Sources

Related terms

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit