{"id":8185,"date":"2026-05-18T12:57:46","date_gmt":"2026-05-18T12:57:46","guid":{"rendered":"https:\/\/cybersecurityinfocus.com\/?p=8185"},"modified":"2026-05-18T12:57:46","modified_gmt":"2026-05-18T12:57:46","slug":"new-image-based-prompt-injection-attack-targets-multimodal-ai-models","status":"publish","type":"post","link":"https:\/\/cybersecurityinfocus.com\/?p=8185","title":{"rendered":"New image-based prompt injection attack targets multimodal AI models"},"content":{"rendered":"<div>\n<div class=\"grid grid--cols-10@md grid--cols-8@lg article-column\">\n<div class=\"col-12 col-10@md col-6@lg col-start-3@lg\">\n<div class=\"article-column__content\">\n<div class=\"container\"><\/div>\n<p>Security researchers have developed a new image-based prompt injection attack that can manipulate how multimodal AI systems interpret user instructions without modifying the original text prompt, potentially expanding security risks for AI agents and vision-language systems.<\/p>\n<p>In a research paper published this week, researchers from Xidian University described a technique called \u201cCrossMPI,\u201d which uses nearly imperceptible image perturbations to alter how large vision-language models (LVLMs) process both visual and textual inputs.<\/p>\n<p>\u201cCrossMPI can steer the model\u2019s interpretation of both textual and visual inputs via image-only prompt injection,\u201d the researchers wrote in the <a href=\"https:\/\/arxiv.org\/pdf\/2605.16090\" target=\"_blank\" rel=\"noopener\">paper<\/a>.<\/p>\n<p>Unlike traditional prompt injection attacks, which typically rely on malicious text instructions embedded in prompts or webpages, the new technique attempts to change how the model interprets a benign user request by manipulating images alone.<\/p>\n<p>\u201cThe perturbed image can manipulate the model\u2019s understanding of the user\u2019s instruction,\u201d the paper said.<\/p>\n<p>In one example described in the paper, researchers subtly modified an image of an airplane using nearly imperceptible pixel-level perturbations invisible to human users. When a multimodal AI system was then asked whether the airplane belonged to Air Canada, the manipulated image caused the model to incorrectly identify the object as \u201ca mobile phone,\u201d illustrating how the attack could distort both visual understanding and interpretation of the user\u2019s task.<\/p>\n<p>The findings add to growing concerns around multimodal AI security as enterprises increasingly deploy AI copilots, autonomous agents, document-processing assistants, and vision-enabled workflows that combine image and text reasoning.<\/p>\n<p>Apeksha Kaushik, senior principal analyst at Gartner, said the risks could grow rapidly as enterprises adopt more multimodal AI systems.<\/p>\n<p>\u201cBy 2030, 80% of enterprise software and applications will be multimodal, up from 1% in 2024,\u201d Kaushik said.<\/p>\n<h2 class=\"wp-block-heading\">Attack targets multimodal reasoning layers<\/h2>\n<p>Prompt injection has emerged as one of the most closely watched risks in generative AI systems, particularly as organizations adopt AI agents capable of interacting with enterprise applications, websites, documents, and external tools.<\/p>\n<p>Most existing prompt injection attacks rely on malicious text embedded in prompts, webpages, or hidden instructions. Some multimodal attacks have also attempted to manipulate AI behavior using images containing visible or hidden text instructions.<\/p>\n<p>The researchers argued their approach differs because it attempts to alter how the model interprets the original task itself through image perturbations alone.<\/p>\n<p>By contrast with earlier methods, the researchers noted that CrossMPI uses image modifications to \u201cchange the model\u2019s interpretation of both the visual and textual prompts.\u201d<\/p>\n<p>The paper said the attack specifically targets the \u201chidden state space of LVLMs\u201d \u2014 the stage where models combine textual instructions and visual evidence into internal representations before generating outputs.<\/p>\n<p>According to the paper, the most effective attack layers were not the final output layers traditionally targeted in adversarial AI attacks, but intermediate layers where visual and textual information are fused together.<\/p>\n<h2 class=\"wp-block-heading\">Researchers claim strong black-box transferability<\/h2>\n<p>The researchers evaluated the technique against multiple open-source LVLMs, including MiniGPT4, BLIP-2, InstructBLIP, BLIVA, and Qwen2.5-VL, the paper added.<\/p>\n<p>According to the paper, the attack achieved an average success rate of 66.36% across tested models, outperforming prior baseline attacks by roughly 41 percentage points.<\/p>\n<p>The researchers also said the technique demonstrated \u201cstrong transferability in black-box settings,\u201d meaning the attacks remained effective even without direct access to a target model\u2019s parameters or architecture.<\/p>\n<p>The paper further claimed the perturbations remained visually stealthy while maintaining effectiveness across multiple LVLM architectures.<\/p>\n<h2 class=\"wp-block-heading\">No effective defense<\/h2>\n<p>The researchers evaluated several defense mechanisms designed to neutralize hidden image manipulations, including random resizing, image rotation, JPEG compression, and inference-level safeguards such as <a href=\"https:\/\/arxiv.org\/abs\/2405.10529\" target=\"_blank\" rel=\"noopener\">SmoothVLM<\/a>, a specialized defense framework designed to protect Vision-Language Models (VLMs) from patched visual prompt injections, and DPS, which guides models using partial image views. <\/p>\n<p>According to the paper, SmoothVLM proved the most effective, reducing attack success rates to below 5% in several scenarios, while JPEG compression also weakened the attacks by suppressing high-frequency image artifacts. <\/p>\n<p>However, the researchers said none of the tested defenses completely eliminated the attacks, suggesting stronger multimodal AI security protections may still be needed.<\/p>\n<h2 class=\"wp-block-heading\">Enterprise AI deployments may widen exposure<\/h2>\n<p>The research arrives as enterprises rapidly expand deployments of multimodal AI systems capable of processing screenshots, PDFs, dashboards, forms, video streams, and enterprise documents alongside natural language prompts.<\/p>\n<p>The researchers noted that adversarial examples generated using the technique could potentially \u201cmislead VLM-based web agents\u201d and \u201cdisrupt real-world object detectors.\u201d<\/p>\n<p>\u201cEven if textual inputs are sanitized, manipulated images can still subvert the model\u2019s outputs or actions,\u201d Kaushik said.<\/p>\n<p>She said organizations that use multimodal AI for document processing, customer interactions, content moderation, and autonomous systems may face increasing exposure to adversarial image manipulation and prompt injection attacks.<\/p>\n<p>\u201cSecurity controls designed for unimodal systems are insufficient,\u201d Kaushik said. The researchers acknowledged that the work was conducted in controlled research settings using open-source models and did not describe observed exploitation in real-world enterprise environments.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Security researchers have developed a new image-based prompt injection attack that can manipulate how multimodal AI systems interpret user instructions without modifying the original text prompt, potentially expanding security risks for AI agents and vision-language systems. In a research paper published this week, researchers from Xidian University described a technique called \u201cCrossMPI,\u201d which uses nearly [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":8186,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-8185","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-education"],"_links":{"self":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/8185"}],"collection":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8185"}],"version-history":[{"count":0,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/8185\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/media\/8186"}],"wp:attachment":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8185"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8185"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8185"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}