How a malicious AI agent skill passed security checks and reached 26,000 users

Tags:

A fake AI agent skill that passed security checks reached over 26,000 users through Instagram, highlighting new risks as enterprises rely on AI-driven tools.

Some of the agents involved were tied to corporate accounts, AIR said. The company said a similar attack could have exposed private conversations and internal systems. AIR said no agents were harmed in the research and that the test payload collected only users’ email addresses so they could be notified.

The experiment centered on a skill called brand-landingpage, which was presented as a tool for helping users build a landing page with Google’s Stitch design tool. AIR said it chose the use case because it would appeal to non-technical corporate users, including marketers, salespeople, and designers.

To make the skill appear credible, AIR said it sought two trust signals: GitHub reputation and safe verdicts from security scanners. Rather than building credibility from scratch, it submitted the skill to a popular open-source agents repository that AIR said had about 36,000 GitHub stars and 156 skills. The pull request was merged after a few days.

AIR then promoted the skill through an Instagram ad, which drove users to install and run it.

The malicious technique did not depend on suspicious code inside the submitted files. Instead, the skill instructed agents to set up a Stitch SDK by following installation instructions hosted at stitch-design.ai, a domain controlled by AIR. Google’s actual Stitch domain is stitch.withgoogle.com.

AIR said it configured the fake domain to redirect to the real Stitch site, making the issue difficult to detect from a static review of the skill alone.

“Current skill security scanners all share the same design – they analyze the skill’s SKILL.md and bundled resources, using a combination of static heuristics and LLM agents,” AIR said.

The company said it tested the skill against scanners from Cisco, Nvidia, and skills.sh, and that all marked brand-landingpage as safe.

Once the skill had gained distribution, AIR changed the content behind the fake Stitch documentation. The revised page instructed agents to download and run a script. In AIR’s test, that script collected the user’s email address, but the company said the same approach could have been used to compromise machines running the agent.

AIR said the experiment showed that AI agent skills cannot be assessed only by scanning their packaged files at the time of approval or installation. The issue, it said, is that a skill can pass review while still pointing an agent to a web page that changes later.

AI skills pose dependency risk

For security teams, the concern is not only that the skill passed review, but that its behavior could change after trust had already been granted.

The test suggests CISOs may need to treat AI skills as part of the enterprise software supply chain, rather than as simple prompts or text files, according to cybersecurity researcher Devashri Datta.

“Treating agent skills as mere text or prompts is a fundamental architectural misunderstanding,” Datta said. “They are executable instruction bundles that dictate how an agent operates, interacts with enterprise systems, and routes data, and they must be governed with the same rigor as third-party open-source packages or SaaS integrations.”

Keith Prabhu, founder and CEO at Confidis, said AI agent skills should be treated as “living third-party dependencies,” rather than static plugins.

“A one-time security scan is no longer sufficient; enterprises need continuous validation and strict runtime controls,” Prabhu said.

That starts with an enterprise-wide AI skills inventory that gives security teams clear ownership records and visibility into each skill’s external connections and permitted data flows.

The case also underlines why point-in-time static scanning is poorly suited to LLM-orchestrated environments, Datta said. The skill passed the scanners because the payload sat behind a mutable external URL that was changed after distribution, rather than inside the submitted package.

Runtime checks become critical

Enterprises should require version pinning and immutable reference tracking for any skill that fetches external instructions or software components, according to Datta. Such content should be localized, tied to a cryptographic hash, and hosted within an enterprise-controlled environment.

Security teams should also enforce least privilege at the agent level, so a skill does not inherit the full data access rights of the user running it.

Prabhu said security leaders should assess AI agent skills throughout their lifecycle, not only when they are first approved. Enterprises should limit employees to approved marketplaces and pre-approved skills, validate external URLs referenced by those skills, and test installation behavior in a sandbox before deployment.

At runtime, network calls should be restricted to approved domains and monitored for unusual activity, Prabhu added. That layer is critical because a skill that appears safe at installation can change behavior after it has already been trusted.

Categories

No Responses

Leave a Reply

Your email address will not be published. Required fields are marked *