Cybersecurity researchers are warning of a new type of supply chain attack, Slopsquatting, induced by a hallucinating generative AI model recommending non-existent dependencies.
According to research by a team from the University of Texas at San Antonio, Virginia Tech, and the University of Oklahama, package hallucination is a common thing with Large Language Models (LLM)-generated code which threat actors can take advantage of.
“The reliance of popular programming languages such as Python and JavaScript on centralized package repositories and open-source software, combined with the emergence of code-generating LLMs, has created a new type of threat to the software supply chain: package hallucinations,” the researchers said in a paper.
From the analysis of 16 code-generation models, including GPT-4, GPT-3.5, CodeLlama, DeepSeek, and Mistral, researchers observed approximately a fifth of the packages recommended to be fakes.
Threat actors can exploit hallucinated names
According to the researchers, threat actors can register hallucinated packages and distribute malicious codes using them.
“If a single hallucinated package becomes widely recommended by AI tools, and an attacker has registered that name, the potential for widespread compromise is real,” according to a Socket analysis of the research. “And given that many developers trust the output of AI tools without rigorous validation, the window of opportunity is wide open.”
Slopsquatting, as researchers are calling it, is a term first coined by Seth Larson, a security developer-in-residence at Python Software Foundation (PSF), for its resemblance to the typosquatting technique. Instead of relying on a user’s mistake, as in typosquats, threat actors rely on an AI model’s mistake.
A significant number of packages, amounting to 19.7% (205,000 packages), recommended in test samples were found to be fakes. Open-source models –like DeepSeek and WizardCoder– hallucinated more frequently, at 21.7% on average, compared to the commercial ones (5.2%) like GPT 4.
Researchers found CodeLlama ( hallucinating over a third of the outputs) to be the worst offender, and GPT-4 Turbo ( just 3.59% hallucinations) to be the best performer.
These hallucinations are bad news
These package hallucinations are particularly dangerous as they were found to be persistent, repetitive, and believable.
When researchers reran 500 prompts that had previously produced hallucinated packages, 43% of hallucinations reappeared every time in 10 successive re-runs, with 58% of them appearing in more than one run.
The study concluded that this persistence indicates “that the majority of hallucinations are not just random noise, but repeatable artifacts of how the models respond to certain prompts.” This increases their value to attackers, it added.
Additionally, these hallucinated package names were observed to be “semantically convincing”. Thirty-eight percent of them had moderate string similarity to real packages, suggesting a similar naming structure. “Only 13% of hallucinations were simple off-by-one typos,” Socket added.
While neither the Socket analysis nor the research paper mentioned any in-the-wild Slopsquatting instances, both cautioned protective measures. Socket recommended developers >install dependency scanners before production and runtime to fish out malicious packages. Rushing through security testing is one of the reasons AI models succumb to hallucinations. Recently, OpenAI was blamed for slashing its models’ testing time and resources significantly, exposing its usage to significant threats.
No Responses