Architecture

May 24, 2026

12 min read

OWASP Top 10 for AI Agents:
The 2026 Threat Model

“The OWASP LLM Top 10 covers model vulnerabilities. The agent top 10 covers what happens when those models can act.”

A taxonomy wheel or grid showing the ten major security risks for AI agents. — The agent threat model expands beyond model output risk into execution, memory, network, and audit surfaces.

The OWASP Top 10 for LLM Applications established the canonical vulnerability taxonomy for language models. But autonomous agents introduce a different threat surface — not just what the model outputs, but what the agent does with tool authority in the real world.

This threat model extends the OWASP LLM taxonomy with agent-specific risks that emerge specifically at the tool-calling, memory, and execution boundary layers. Each entry is mapped to mitigation controls and, where applicable, to the corresponding entry in the ProvnAI AI Security Glossary.

10 risk categories

AA01

Prompt Injection

Critical

Adversarial instructions in user input, tool outputs, or retrieved documents override the agent's intended behavior. The agent executes attacker-controlled actions with its full tool authority.

Mitigation: Score routed tool-call arguments for intent risk, enforce deterministic rules before semantic scoring, and preserve intent evidence where the deployment requires higher assurance.

→ Prompt Injection

AA02

Indirect Prompt Injection

Critical

Adversarial instructions are embedded in external content that the agent retrieves — web pages, PDFs, emails, database records. The attack executes without any direct interaction with the user or the model.

Mitigation: All retrieved content treated as untrusted input. Tool call arguments derived from retrieval results scored against declared session intent before execution.

→ Indirect Injection

AA03

Excessive Agency (Permission Drift)

High

The agent is granted — or acquires through semantic drift — more capability than the task requires. Overly permissive tool schemas combined with an agent that expands its own scope create a privilege escalation surface.

Mitigation: Least-privilege tool schemas per session. Per-session authority scoping. Execution boundary enforcement rejects tool calls outside declared session scope.

→ Permission Drift

AA04

Context Poisoning

High

Adversarial content enters the agent's long-term memory or vector store. Unlike single-turn injection, poisoned memory persists across sessions, gradually corrupting the agent's behavioral alignment.

Mitigation: Merkle-committed retrieval logs. Witness anchoring for all retrieval events. Semantic drift detection across sessions. See adversarial RAG analysis.

→ Context Poisoning

AA05

SSRF via Agent Tools

High

Agents with HTTP fetch tools can be directed to cloud metadata endpoints (169.254.169.254), internal network services, or localhost ports — effectively weaponizing the agent as a network pivot.

Mitigation: Block internal IP ranges, cloud metadata endpoints, and private network prefixes at the proxy layer. SSRF protection should be deterministic — no model judgment involved.

→ SSRF (AI Context)

AA06

Tool-Call Hijacking

High

A compromised or malicious MCP server returns responses that redirect the agent to invoke different tools, with different arguments, than the user intended. The MCP server itself becomes an attack vector.

Mitigation: Tool call arguments validated against session intent regardless of source. Signed tool manifests. Anomaly detection on unexpected tool sequences.

→ Tool-Call Hijacking

AA07

Data Exfiltration via Tool Parameters

Medium

Sensitive context data — system prompts, user PII, retrieved documents, session state — is embedded in outbound tool call parameters by an injected instruction, silently transmitting it to an attacker-controlled endpoint.

Mitigation: Outbound tool call argument scanning for sensitive data patterns. Network egress controls on agent runtime. Anomaly detection on unusual data volumes in outbound tool params.

→ Data Exfiltration

AA08

Path Traversal via Filesystem Tools

Medium

Filesystem-capable agents can be instructed to read or write files outside their intended working directory using traversal sequences (../../). Successful traversal exposes configuration files, credentials, and OS resources.

Mitigation: Deterministic path normalization and containment at the proxy layer. Working directory enforced as root. Traversal sequences blocked before any path is passed to the filesystem tool.

→ Path Traversal

AA09

Privilege Escalation via Semantic Drift

Medium

Through a sequence of individually benign-looking tool calls, the agent incrementally expands its effective capability — accessing systems, data, or APIs outside its intended scope without any single call triggering a block.

Mitigation: Behavioral tracking across the full session. Per-session scope binding. Tool call sequence analysis to detect stepwise privilege expansion.

→ Privilege Escalation

AA10

Insufficient Audit Trail

Medium

Agent actions are logged incompletely, in mutable storage, or not at all. When an incident occurs, reconstruction is impossible. Compliance evidence cannot be produced. Post-incident forensics yield nothing actionable.

Mitigation: Emit structured audit events for routed permitted, denied, and escalated tool calls. Higher-assurance deployments can use append-only or tamper-evident evidence layers to make retroactive modification detectable.

→ Merkle Audit Trail

How to Use This Threat Model

This taxonomy is designed to be used in three ways: as a threat modeling input when designing new agentic systems, as an audit checklist for reviewing existing deployments, and as a detection framework for security teams building monitoring and alerting on agent behavior.

For each risk category, the question to ask is not just “is this theoretically possible?” but “do we have a deterministic, tested control at the execution boundary that would catch and block this regardless of what the model outputs?”

Model-level mitigations — system prompts, fine-tuning, RLHF — are insufficient alone. They are bypassable. The controls that matter for production security are at the execution boundary— outside the model's reasoning process entirely.

Control map showing how execution-boundary enforcement mitigates multiple agent risk categories. — The strongest controls live outside the model, at the execution boundary where tool authority becomes real consequence.

Implement the execution boundary controls.

McpVanguard addresses AA01–AA09 at the proxy layer. VEX Protocol addresses AA10 with cryptographic evidence sealing.

McpVanguard Full Glossary Apply for Pilot

Release Analysis·June 7, 2026

Why MCP Security Needs Layered Runtime Enforcement

Why McpVanguard v2.1.0 treats semantic scoring as an advisor and formalizes five-layer runtime enforcement for MCP.

Read Article

Security·June 4, 2026

What Is an MCP Security Proxy? Prompt Injection Defense for AI Agents

How an MCP security proxy intercepts tool calls before execution, blocking prompt injection, SSRF, and path traversal.

Read Article

Governance·June 4, 2026

EU AI Act for Autonomous Agents: Evidence Architecture in Practice

Articles 13, 14, and 17 mapped to Evidence Capsules, witness logs, and cryptographic commitment.

Read Article

Security·May 20, 2026

MCP Security in Production: The Definitive 2026 Guide

A layer-by-layer guide to securing MCP deployments — attack surface, five-layer defense, and production checklist.

Read Article