Threat Analysis20 min read

The AI Agent Threat Landscape: A Research Summary

A synthesis of current research on security threats facing autonomous AI systems

Moltwire Security Research·

Key Findings

  • 1Prompt injection is ranked #1 in OWASP Top 10 for LLM Applications 2025, exploiting fundamental LLM design rather than patchable flaws
  • 2Research shows GPT-4 based agents are vulnerable to indirect prompt injection 24% of the time, nearly doubling with enhanced attack prompts (InjecAgent benchmark)
  • 3Just 5 carefully crafted documents can manipulate AI responses 90% of the time through RAG poisoning (OWASP 2025)
  • 4AI agents move 16x more data than human users, expanding the attack surface significantly (Obsidian Security)
  • 5Over 60% of large enterprises deployed autonomous AI agents in production by 2025, yet legacy security tools remain inadequate (Microsoft Ignite 2025)

Abstract

As AI agents transition from experimental technology to production infrastructure, they introduce novel attack surfaces that traditional security tools cannot address. This paper synthesizes current academic research and industry findings on the threat landscape for AI agents. We examine published research on prompt injection (ranked #1 in OWASP Top 10 for LLM Applications 2025), behavioral threats, and supply chain risks. Drawing on benchmarks like InjecAgent and real-world incidents documented by security researchers, we present a consolidated view of what organizations deploying AI agents need to understand about current threats. This is a research summary, not original research. All statistics and findings are drawn from cited sources including OWASP, ACM Computing Surveys, arXiv papers, and industry security reports.

Introduction

The deployment of AI agents in production environments has accelerated dramatically. According to Microsoft's Agent 365 announcement at Ignite 2025, more than 60% of large enterprises have deployed autonomous AI agents in production. These systems browse the web, execute code, manage files, and interact with external services autonomously.

This expansion comes with significant security implications. The ACM Computing Surveys published "AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways" in February 2025, identifying four critical knowledge gaps in agent security: unpredictability of multi-step user inputs, complexity in internal executions, variability of operational environments, and interactions with untrusted external entities.

This paper synthesizes findings from academic research, industry reports, and documented incidents to present a consolidated view of the current threat landscape.

Prompt Injection: The #1 Threat

Prompt injection has been ranked as the #1 vulnerability in the OWASP Top 10 for LLM Applications since the list was first compiled, maintaining this position in the 2025 update. As OWASP notes, "it's no shock that prompt injection is the number one threat to LLMs because it exploits the design of LLMs rather than a flaw that can be patched."

Attack Success Rates from Research:

The InjecAgent benchmark (Zhan et al., ACL 2024 Findings) provides concrete data on attack success rates across 30 different LLM agents with 1,054 test cases:

  • ReAct-prompted GPT-4 is vulnerable to attacks 24% of the time
  • When attackers reinforce instructions with enhanced prompts, success rates nearly double
  • The benchmark spans 17 user tools × 62 attacker tools, testing both "direct harm" and "data stealing" attacks
  • Other research shows even higher success rates in specific contexts:

  • Evolutionary optimization methods like StruPhantom achieved over 90% attack success rates in LLM-powered spreadsheet applications
  • Adaptive attacks can exceed 50% success rates even against state-of-the-art defenses
  • Research demonstrates that just 5 carefully crafted documents can manipulate AI responses 90% of the time through RAG poisoning
  • Types of Prompt Injection:

  • Direct Injection: User-supplied input that attempts to override system instructions
  • Indirect Injection: Hidden instructions in external content (documents, websites, emails) that the LLM processes
  • Multimodal Injection: Hidden instructions in images processed alongside text prompts
  • According to Microsoft's security blog, indirect prompt injection is one of the most widely-used attack techniques reported to them.

    Documented Security Incidents

    Security researchers have documented numerous real-world incidents that demonstrate the practical impact of these vulnerabilities:

    Memory Manipulation (February 2025): Security researcher Johann Rehberger demonstrated vulnerabilities in Google's Gemini AI to indirect prompt injection attacks that manipulated its long-term memory. Hidden instructions within documents could be stored and later triggered by user interactions.

    ChatGPT Persistent Injection (2024): A persistent prompt injection attack manipulated ChatGPT's memory feature, enabling long-term data exfiltration across multiple conversations.

    Slack AI Data Exfiltration (August 2024): Researchers discovered Slack AI data exfiltration vulnerabilities combining RAG poisoning with social engineering techniques.

    ChatGPT Browsing Exploitation (May 2024): Researchers exploited ChatGPT's browsing capabilities by poisoning RAG context with malicious content from untrusted websites.

    Healthcare Data Breach (Early 2024): A compromised customer service AI agent at a healthcare provider leaked patient records for three months. The agent had legitimate access to electronic health records, and attackers used prompt injection to extract and transmit PHI to external endpoints. The breach went undetected because the agent's API calls matched expected patterns, costing the organization $14 million in fines and remediation.

    GitHub Copilot RCE (2025): CVE-2025-53773 revealed a remote code execution vulnerability in GitHub Copilot with a CVSS score of 9.6.

    Agent-Specific Threat Categories

    The OWASP Agentic Security Initiative (ASI) has published a taxonomy of 15 threat categories for agentic AI. Research from ACM Computing Surveys identifies threats specific to autonomous agents:

    Goal Misalignment: Agents may pursue objectives that diverge from intended behavior, especially in complex multi-step reasoning chains.

    Memory Poisoning: For agents with persistent memory, attackers may inject false memories or manipulate stored context to influence future behavior.

    Multi-Agent Collusion: Research shows that when agents share memory, databases, execution privileges, or delegated tasks, a single compromised agent can repeatedly trigger harmful actions across the system. This emergent collusion arises from shared state and privileges rather than explicit coordination.

    Tool Misuse: Agents with access to tools (file systems, APIs, email) can be manipulated into using those tools for unauthorized purposes.

    Supply Chain Attacks: A 2025 supply chain attack on the OpenAI plugin ecosystem resulted in compromised agent credentials being harvested from 47 enterprise deployments. Attackers used these credentials to access customer data, financial records, and proprietary code for six months before discovery.

    The MCP (Model Context Protocol), launched by Anthropic in November 2024, creates new attack vectors through indirect prompt injection vulnerabilities as AI assistants interpret natural language commands before sending them to MCP servers.

    Why Detection Is Difficult

    Traditional security tools face fundamental challenges with AI-specific attacks:

    Scale of Data Movement: AI agents move 16x more data than human users, according to Obsidian Security research. This expanded attack surface requires new monitoring approaches.

    Behavioral Unpredictability: Unlike traditional workloads, AI agents exhibit non-deterministic behaviors that are difficult to baseline. Microsoft explicitly recognized at Ignite 2025 that autonomous agents operating with increasing decision-making authority require fundamentally different security controls.

    Semantic Attacks: Attack payloads are natural language, not exploit code. Traditional signature matching is ineffective when the same attack can be phrased infinitely many ways.

    Legitimate Action Masquerading: Malicious actions often look identical to legitimate ones. An agent sending data to an attacker's endpoint looks the same as normal API usage.

    Defense Limitations: Research shows that most current prompt-based defenses suffer from high Attack Success Rates (ASR), demonstrating limited robustness against sophisticated injection attacks. Even "state-of-the-art" defenses can be bypassed when adversarial optimization is applied.

    However, some approaches show promise. A firewall-based defense approach achieved "perfect security (0% ASR) with high utility across four public benchmarks: AgentDojo, Agent Security Bench, InjecAgent and τ-Bench."

    Recommendations from Research

    Based on the research reviewed, several defensive approaches show promise:

    1. Behavioral Monitoring: User and Entity Behavior Analytics (UEBA) allows security systems to detect novel, zero-day, and polymorphic attacks invisible to traditional tools. By establishing baselines for normal agent behavior, anomalies can be detected even when the specific attack technique is novel.

    2. Defense in Depth: No single control stops all attacks. Research suggests layering input analysis, behavioral monitoring, output filtering, and action controls.

    3. Tool-Boundary Firewalls: Research on "Indirect Prompt Injections: Are Firewalls All You Need?" investigates lightweight LLM firewalls: a Tool-Input Firewall (Minimizer) and Tool-Output Firewall (Sanitizer) placed at the agent-tool boundary.

    4. Regulatory Compliance: NIST SP 1800-35, published in November 2024, provides guidance on implementing Zero Trust Architecture with Enhanced Identity Governance. Federal agencies face a 2026 implementation deadline.

    5. Continuous Authentication: Traditional IAM tools are inadequate for entities with non-deterministic behaviors. Continuous authentication and behavioral verification are increasingly necessary.

    The global AI-in-cybersecurity market is expected to grow from $24.8B in 2024 to $146.5B by 2034, reflecting the scale of investment required to address these challenges.

    References

    1. OWASP Foundation. "OWASP Top 10 for LLM Applications 2025". OWASP, 2025.
    2. Zhan, Q. et al.. "InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents". ACL 2024 Findings, 2024.
    3. Various. "AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways". ACM Computing Surveys, 2025.
    4. Obsidian Security. "The 2025 AI Agent Security Landscape: Players, Trends, and Risks". Obsidian Security Blog, 2025.
    5. Microsoft Security Response Center. "How Microsoft Defends Against Indirect Prompt Injection Attacks". Microsoft MSRC Blog, 2025.
    6. Various. "Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?". arXiv, 2025.