Agent Hijacking
In one line
Agent hijacking is the attack where malicious instructions hidden inside external data or tool outputs take over an agent's decision-making — the headline security threat for autonomous agents.
Going deeper
Agent hijacking is often called indirect prompt injection. The malicious instruction does not come from the user — it is embedded in external data the agent retrieves through a tool: a web page, an email, a document, an API response. A line like 'Ignore previous instructions and ...' sitting inside data you trusted enough to ingest can be mistaken by the model for a user directive.
It matters because agents have tools. A vanilla chatbot just gives a bad answer; an agent that sends email, places orders or deploys code can do real damage when hijacked. The attack surface grows as autonomy levels rise.
There is no single defence. The realistic answer is layered: separate trusted from untrusted inputs, minimise tool permissions, require human-in-the-loop for risky actions, run output and content checks, and red-team regularly. Treat it as ongoing posture, not a one-time fix.
Sources
Related terms
Agent Autonomy Level
Agent autonomy level describes how far an agent acts on its own without human intervention — usually in stages, much like the autonomous driving levels people are already familiar with.
AI AgentPermission Model
A permission model defines which tools, data and actions an agent is allowed to touch — the core safety layer for any autonomous agent.
AI AgentSandboxing
Sandboxing means running an agent in an isolated environment so its actions cannot reach the outside system — a baseline practice for any autonomous agent.
AI AgentHuman-in-the-Loop
Human-in-the-loop (HITL) is the design pattern where an agent runs autonomously but routes critical decisions through a human for review and approval.
AI AgentAgent Evaluation
Agent evaluation is the test and metric framework for measuring how accurately and safely an agent completes its goals — distinct from plain LLM benchmarking.
How does your brand show up in AI answers?
Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.
Get a free audit