{"id":8472,"date":"2026-06-12T10:02:25","date_gmt":"2026-06-12T10:02:25","guid":{"rendered":"https:\/\/cybersecurityinfocus.com\/?p=8472"},"modified":"2026-06-12T10:02:25","modified_gmt":"2026-06-12T10:02:25","slug":"prompt-injection-breaks-todays-ai-agents-study-warns","status":"publish","type":"post","link":"https:\/\/cybersecurityinfocus.com\/?p=8472","title":{"rendered":"Prompt injection breaks today\u2019s AI agents, study warns"},"content":{"rendered":"<div>\n<div class=\"grid grid--cols-10@md grid--cols-8@lg article-column\">\n<div class=\"col-12 col-10@md col-6@lg col-start-3@lg\">\n<div class=\"article-column__content\">\n<div class=\"container\"><\/div>\n<p>Today\u2019s AI web agents have no dependable defenses against prompt injection, according to new research showing that not a single attack scenario was consistently blocked across leading systems powered by GPT\u20115 and Gemini.<\/p>\n<p>The findings come from StakeBench, a\u00a0stakeholder-centric\u00a0benchmark developed by researchers from Nanyang Technological University, ST Engineering, IBM Research, and the University of Illinois Urbana-Champaign to evaluate prompt injection attacks against AI agents operating in realistic web environments.<\/p>\n<p>The researchers executed 3,168 adversarial runs across NanoBrowser and BrowserUse using 264 benchmark cases. Indirect prompt injection attacks, where malicious instructions are hidden inside ordinary web content such as product reviews and metadata, achieved attack success rates ranging from 41.67% to 68.16%, while direct prompt injection exceeded 79% across all tested configurations.<\/p>\n<p>\u201cCrucially, these failures exhibit distinct patterns when analysed through a stakeholder lens: some attacks succeed without disrupting the user\u2019s delegated task while disproportionately harming third parties (stealthy parasitism), whereas others disrupt task completion without realizing the adversarial objective (misaligned disruption),\u201d <a href=\"https:\/\/arxiv.org\/html\/2606.13385v1\" target=\"_blank\" rel=\"noopener\">the researchers wrote in a paper<\/a>.<\/p>\n<p>OpenAI and Google did not immediately respond to requests for comment.<\/p>\n<h2 class=\"wp-block-heading\">Every attack objective exposed at least one failure mode<\/h2>\n<p>The benchmark evaluated web agents across four possible outcomes: Robust Behavior, Stealthy Parasitism, Misaligned Disruption, and Compounded Failure. Robust Behavior represents the ideal state in which an agent completes a user\u2019s task without advancing an attacker\u2019s objective or exhibiting execution instability.<\/p>\n<p>The researchers argue that the findings reveal a broader problem than high attack success rates.<\/p>\n<p>\u201cThe Robust Behavior region remains unpopulated across all evaluated configurations,\u201d they wrote, meaning every tested attack objective resulted in at least one meaningful failure dimension, whether successful adversarial manipulation, disruption of the user\u2019s intended task, or execution instability.<\/p>\n<p>The authors say this demonstrates that \u201cprompt-injection vulnerability in deployable web agents cannot be characterized by any single metric in isolation,\u201d because attack success and task disruption are \u201cweakly coupled in practice.\u201d<\/p>\n<h2 class=\"wp-block-heading\">Attacks can succeed while users see nothing wrong<\/h2>\n<p>One of the failure modes identified by the benchmark is what the researchers call \u201cstealthy parasitism,\u201d in which an AI agent completes the user\u2019s delegated task while simultaneously advancing an attacker\u2019s objective.<\/p>\n<p>The paper illustrates the risk with an online shopping scenario: \u201cA malicious prompt injected into product reviews may bias an agent toward a specific item: although the user may still receive an acceptable recommendation, the same behaviour can disadvantage competing sellers and undermine platform integrity.\u201d<\/p>\n<p>The researchers argue that prompt injection has evolved into \u201ca system-level security problem with multi-party harm,\u201d rather than a model safety issue affecting only the end user.<\/p>\n<h2 class=\"wp-block-heading\">Different stakeholders face different risks<\/h2>\n<p>Unlike existing benchmarks that primarily measure attack success, StakeBench evaluates harm across three stakeholder groups: end users, third-party sellers, and platforms.<\/p>\n<p>The results show that those groups experience materially different risks.<\/p>\n<p>Seller-targeted attacks recorded the highest attack success rates across both evaluated web agents. User-targeted attacks, however, produced the lowest task deviation rates, suggesting they may be harder to detect because workflows continue to appear normal even when adversarial objectives are achieved.<\/p>\n<p>According to the researchers, \u201cthe same agent can simultaneously appear stealthy on user-targeted attacks, susceptible on seller-targeted attacks, and unstable on platform-targeted attacks.\u201d<\/p>\n<p>That, they argue, makes \u201caggregate ASR alone insufficient to characterize stakeholder-specific vulnerability.\u201d<\/p>\n<h2 class=\"wp-block-heading\">Models and architectures influence outcomes<\/h2>\n<p>The benchmark also found meaningful differences between AI models and agent architectures.<\/p>\n<p>Replacing GPT-5 with Gemini-2.5-Flash increased indirect prompt injection success rates by 26.49 percentage points on NanoBrowser and by 6.2 percentage points on BrowserUse, the paper said. BrowserUse also consistently exhibited higher task deviation and behavioral irregularity than NanoBrowser, it added.<\/p>\n<p>According to the researchers, the findings suggested prompt injection resilience depends not only on the language model but also on how it is implemented within an autonomous agent.<\/p>\n<p>\u201cThese results indicate that prompt-injection security in deployable web agents is not a scalar property of the backbone model but a distribution of harm whose realisation is jointly determined by the affected stakeholder, the semantic alignment between the injected objective and the user\u2019s task, and the architectural context in which the backbone is deployed,\u201d the paper added.<\/p>\n<h2 class=\"wp-block-heading\">Images may emerge as the next attack vector<\/h2>\n<p>The researchers also explored whether prompt injection could extend beyond text.<\/p>\n<p>In a preliminary multimodal experiment, they modified only a product image while leaving accompanying text, ratings, and page structure unchanged. The manipulated product\u2019s selection rate increased from 10% to 76.67% without rating signals, suggesting visual content alone may significantly influence AI agent decisions.<\/p>\n<p>While the experiment was limited in scope, the researchers said the results indicate \u201cthe IPI surface relevant to deployable web agents may extend beyond textual channels to visual ones,\u201d pointing to another emerging attack vector as enterprises increasingly deploy autonomous AI systems.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Today\u2019s AI web agents have no dependable defenses against prompt injection, according to new research showing that not a single attack scenario was consistently blocked across leading systems powered by GPT\u20115 and Gemini. The findings come from StakeBench, a\u00a0stakeholder-centric\u00a0benchmark developed by researchers from Nanyang Technological University, ST Engineering, IBM Research, and the University of Illinois [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":8473,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-8472","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-education"],"_links":{"self":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/8472"}],"collection":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8472"}],"version-history":[{"count":0,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/8472\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/media\/8473"}],"wp:attachment":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8472"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8472"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8472"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}