THREAT

Indirect Prompt Injection

retrieval-augmented attackexternal content injectionRAG poisoning

Indirect prompt injection is a vector in which adversarial instructions are embedded in external content — web pages, documents, emails, database records — that an AI agent retrieves and processes autonomously.

ADVERSARIAL MECHANICS

An attacker places invisible or camouflaged instruction text in a publicly accessible resource — a white-paper footnote, an HTML comment, or a shared document. When an agent retrieves this resource via a tool call (e.g., browse_web), the payload enters the context window. Encoding variants (Unicode homoglyphs, zero-width characters) are used to evade naive matching.

PROTOCOL CONTEXT (MCP)

In MCP-based systems, this attack targets tool_result payloads from resource-access tools. Because MCP tools operate with elevated trust by design, injections arriving via tool_result are processed with high fidelity by the model.

ProvnAI Mitigation

McpVanguard can apply content-layer scanning to routed tool_result objects, not only user-originated inputs. Configured jailbreak and retrieval-context signatures can detect injection patterns in retrieved content before they reach the next model context.