{"id":6259,"date":"2025-12-18T11:59:50","date_gmt":"2025-12-18T11:59:50","guid":{"rendered":"https:\/\/cybersecurityinfocus.com\/?p=6259"},"modified":"2025-12-18T11:59:50","modified_gmt":"2025-12-18T11:59:50","slug":"human-in-the-loop-isnt-enough-new-attack-turns-ai-safeguards-into-exploits","status":"publish","type":"post","link":"https:\/\/cybersecurityinfocus.com\/?p=6259","title":{"rendered":"Human-in-the-loop isn\u2019t enough: New attack turns AI safeguards into exploits"},"content":{"rendered":"<div>\n<div class=\"grid grid--cols-10@md grid--cols-8@lg article-column\">\n<div class=\"col-12 col-10@md col-6@lg col-start-3@lg\">\n<div class=\"article-column__content\">\n<div class=\"container\"><\/div>\n<p>Human-in-the-loop (HITL) safeguards that AI agents rely on can be subverted, allowing attackers to weaponize them to run malicious code, new research from CheckMarx shows.<\/p>\n<p>HITL dialogs are a safety backstop (a final \u201care you sure?\u201d) that the agents run before executing sensitive actions like running code, modifying files, or touching system resources.<\/p>\n<p>Checkmarx researchers described it as an HITL dialog forging technique they\u2019re calling Lies-in-the-Loop (LITL), where malicious instructions are embedded into AI prompts in ways that mislead users reviewing approval dialogs.<\/p>\n<p>The research findings reveal that keeping a human in the loop is not enough to neutralize prompt-level abuse. Once users can\u2019t reliably trust what they\u2019re being asked to approve, HITL stops being a guardrail and becomes an attack surface.<\/p>\n<p>\u201cThe Lies-in-the-Loop (LITL) attack exploits the trust users place in these approval dialogs,\u201d CheckMarx researchers said in a blog <a href=\"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/\" target=\"_blank\" rel=\"noopener\">post<\/a>. \u201cBy manipulating what the dialog displays, attackers turn the safeguard into a weapon\u200a\u2014 once the prompt looks safe, users approve it without question.\u201d<\/p>\n<h2 class=\"wp-block-heading\"><a><\/a>Dialog forging turns oversight into an attack primitive<\/h2>\n<p>The problem stems from how AI systems present confirmation dialogs to users. HITL <a href=\"https:\/\/www.csoonline.com\/article\/648427\/new-llm-based-soc-tool-to-help-automate-security-response.html?utm=hybrid_search#:~:text=agent%20implements%20a-,human-in-the-loop,-approach%20that%20requires\">workflows<\/a> typically summarize the action an AI agent wants to perform, expecting the human reviewer to spot anything suspicious before clicking approve.<\/p>\n<p>CheckMarx demonstrated that attackers can manipulate these dialogs by hiding or misrepresenting malicious instructions, like padding payloads with benign-looking text, pushing dangerous commands out of the visible view, or crafting prompts that cause the AI to generate misleading summaries of what will actually execute.<\/p>\n<p>In terminal-style interfaces, especially, long or formatted outputs make this kind of deception easy to miss. Since many AI agents operate with elevated privileges, a single misled approval can translate directly into code execution, running OS commands, file system access, or downstream compromise, according to CheckMarx findings.<\/p>\n<p>Beyond padding or truncation, the researchers also described other dialog-forging techniques that abuse how confirmation is rendered. By leveraging Markdown rendering and layout behaviors, attackers can visually separate benign text from hidden commands or manipulate summaries so the human-visible description isn\u2019t malicious.<\/p>\n<p>\u201cThe fact that attackers can theoretically break out of the Markdown syntax used for the HITL dialog, presenting the user with fake UI, can lead to much more sophisticated LITL attacks that can go practically undetected,\u201d the researchers added.<\/p>\n<h2 class=\"wp-block-heading\"><a><\/a>Defensive steps for agents and users<\/h2>\n<p>Checkmarx recommended measures primarily for AI agent developers, urging them to treat HITL dialogs as potentially manipulative rather than inherently trustworthy. Recommended steps include constraining how dialogs are rendered, limiting the use of complex UI formatting, and clearly separating human-visible summaries from the underlying actions that will be executed.<\/p>\n<p>The researchers also advised validating approved operations to ensure they match what the user was shown at confirmation time.<\/p>\n<p>For AI users, they noted that agents operating in richer UI environments can make deceptive behavior easier to detect than text-only terminals. \u201cFor instance, VS Code <a href=\"https:\/\/www.csoonline.com\/article\/3956464\/warning-to-developers-stay-away-from-these-10-vscode-extensions.html\">extensions<\/a> provide full Markdown rendering capabilities, whereas terminals typically display content using basic ASCII characters,\u201d they said.<\/p>\n<p>CheckMarx said the issue was disclosed to Anthropic and Microsoft, both of which acknowledged the report but did not classify it as a security vulnerability. Neither company immediately responded to CSO\u2019s request for comments.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Human-in-the-loop (HITL) safeguards that AI agents rely on can be subverted, allowing attackers to weaponize them to run malicious code, new research from CheckMarx shows. HITL dialogs are a safety backstop (a final \u201care you sure?\u201d) that the agents run before executing sensitive actions like running code, modifying files, or touching system resources. Checkmarx researchers [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":6260,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-6259","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-education"],"_links":{"self":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/6259"}],"collection":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6259"}],"version-history":[{"count":0,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/6259\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/media\/6260"}],"wp:attachment":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6259"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6259"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6259"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}