Architecture
May 24, 2026
12 min read

OWASP Top 10 for AI Agents:
The 2026 Threat Model

“The OWASP LLM Top 10 covers model vulnerabilities. The agent top 10 covers what happens when those models can act.”

The OWASP Top 10 for LLM Applications established the canonical vulnerability taxonomy for language models. But autonomous agents introduce a different threat surface — not just what the model outputs, but what the agent does with tool authority in the real world.

This threat model extends the OWASP LLM taxonomy with agent-specific risks that emerge specifically at the tool-calling, memory, and execution boundary layers. Each entry is mapped to mitigation controls and, where applicable, to the corresponding entry in the ProvnAI AI Security Glossary.

10 risk categories
AA01

Prompt Injection

Critical

Adversarial instructions in user input, tool outputs, or retrieved documents override the agent's intended behavior. The agent executes attacker-controlled actions with its full tool authority.

Mitigation: Semantic intent scoring on every tool call argument. Signed intent capsules per session. McpVanguard's L1 rules engine blocks known injection patterns before semantic scoring.
Prompt Injection
AA02

Indirect Prompt Injection

Critical

Adversarial instructions are embedded in external content that the agent retrieves — web pages, PDFs, emails, database records. The attack executes without any direct interaction with the user or the model.

Mitigation: All retrieved content treated as untrusted input. Tool call arguments derived from retrieval results scored against declared session intent before execution.
Indirect Injection
AA03

Excessive Agency (Permission Drift)

High

The agent is granted — or acquires through semantic drift — more capability than the task requires. Overly permissive tool schemas combined with an agent that expands its own scope create a privilege escalation surface.

Mitigation: Least-privilege tool schemas per session. Per-session authority scoping. Execution boundary enforcement rejects tool calls outside declared session scope.
Permission Drift
AA04

Context Poisoning

High

Adversarial content enters the agent's long-term memory or vector store. Unlike single-turn injection, poisoned memory persists across sessions, gradually corrupting the agent's behavioral alignment.

Mitigation: Merkle-committed retrieval logs. Witness anchoring for all retrieval events. Semantic drift detection across sessions. See adversarial RAG analysis.
Context Poisoning
AA05

SSRF via Agent Tools

High

Agents with HTTP fetch tools can be directed to cloud metadata endpoints (169.254.169.254), internal network services, or localhost ports — effectively weaponizing the agent as a network pivot.

Mitigation: Block internal IP ranges, cloud metadata endpoints, and private network prefixes at the proxy layer. SSRF protection should be deterministic — no model judgment involved.
SSRF (AI Context)
AA06

Tool-Call Hijacking

High

A compromised or malicious MCP server returns responses that redirect the agent to invoke different tools, with different arguments, than the user intended. The MCP server itself becomes an attack vector.

Mitigation: Tool call arguments validated against session intent regardless of source. Signed tool manifests. Anomaly detection on unexpected tool sequences.
Tool-Call Hijacking
AA07

Data Exfiltration via Tool Parameters

Medium

Sensitive context data — system prompts, user PII, retrieved documents, session state — is embedded in outbound tool call parameters by an injected instruction, silently transmitting it to an attacker-controlled endpoint.

Mitigation: Outbound tool call argument scanning for sensitive data patterns. Network egress controls on agent runtime. Anomaly detection on unusual data volumes in outbound tool params.
Data Exfiltration
AA08

Path Traversal via Filesystem Tools

Medium

Filesystem-capable agents can be instructed to read or write files outside their intended working directory using traversal sequences (../../). Successful traversal exposes configuration files, credentials, and OS resources.

Mitigation: Deterministic path normalization and containment at the proxy layer. Working directory enforced as root. Traversal sequences blocked before any path is passed to the filesystem tool.
Path Traversal
AA09

Privilege Escalation via Semantic Drift

Medium

Through a sequence of individually benign-looking tool calls, the agent incrementally expands its effective capability — accessing systems, data, or APIs outside its intended scope without any single call triggering a block.

Mitigation: Behavioral tracking across the full session. Per-session scope binding. Tool call sequence analysis to detect stepwise privilege expansion.
Privilege Escalation
AA10

Insufficient Audit Trail

Medium

Agent actions are logged incompletely, in mutable storage, or not at all. When an incident occurs, reconstruction is impossible. Compliance evidence cannot be produced. Post-incident forensics yield nothing actionable.

Mitigation: Tamper-evident, append-only audit log for every tool call — permitted, denied, and escalated. Cryptographic commitment (Merkle chain) prevents retroactive log modification.
Merkle Audit Trail

How to Use This Threat Model

This taxonomy is designed to be used in three ways: as a threat modeling input when designing new agentic systems, as an audit checklist for reviewing existing deployments, and as a detection framework for security teams building monitoring and alerting on agent behavior.

For each risk category, the question to ask is not just “is this theoretically possible?” but “do we have a deterministic, tested control at the execution boundary that would catch and block this regardless of what the model outputs?”

Model-level mitigations — system prompts, fine-tuning, RLHF — are insufficient alone. They are bypassable. The controls that matter for production security are at the execution boundary— outside the model's reasoning process entirely.

Implement the execution boundary controls.

McpVanguard addresses AA01–AA09 at the proxy layer. VEX Protocol addresses AA10 with cryptographic evidence sealing.