Hugging Face Transformers RCE flaw enables stealthy compromise via AI model configs

June 4 • 12:01 pm

Tags:

No tags

A high severity vulnerability in Hugging Face Transformers enables attackers to compromise systems that use the popular Python library to test and run AI models. The flaw impacts library versions that continue to be actively downloaded and comes at a time when attackers are increasingly targeting the AI supply chain, including through malicious models hosted on the Hugging Face platform.

The exploit for this vulnerability involves adding an innocuous-looking parameter called _attn_implementation_internal to remote model configuration files on Hugging Face and bypasses the trust_remote_code=false flag that normally prevents the execution of remote code accompanying models.

“The malicious field uses an underscore-prefixed name that looks like an internal implementation detail — the kind of field that config files are full of,” researchers from Pluto Security who found the vulnerability said in their report. “There are no runtime warnings, no consent prompts, no unusual log entries.”

The Hugging Face Transformers library allows Python developers to deploy over 1 million machine learning model variants hosted on Hugging Face on their local hardware or cloud instances. It is used in many enterprise environments and CI/CD pipelines to test models pre-trained for various tasks and to fine-tune them with proprietary data.

The Hugging Face Transformers PyPI package is downloaded over 146 million times per month and has a total of 2.2 billion installs to date. The project is also one of the highest-rated repositories on GitHub with 161K+ stars, so the blast radius of a remote code execution (RCE) vulnerability is huge.

This previously undisclosed flaw, now tracked as CVE-2026-4372, was silently patched in Transformers 5.3.0, which was released on March 3, but it impacts all versions released since August starting with 4.56.0. Vulnerable versions continue to be downloaded 7 to 8 million times per week and account for around a fourth of weekly installations.

Custom ‘attention’ kernels bypass RCE defenses

AI models hosted on Hugging Face can contain custom Python code, which can present a serious security risk if downloaded and executed alongside the model automatically. In the past this capability has been abused by attackers, which is why a parameter called trust_remote_code: was added to configurations. When set to false, it is meant to give developers an assurance that additional code will not be automatically executed.

However, in March last year, Hugging Face added a feature called Hub Kernels that allows users to host custom compiled attention kernels. These kernels improve the performance of models when loaded on GPUs and require an additional package called kernels.

The presence of this package on the machine is required to exploit this vulnerability, which is a limiting factor at first glance. However, even though it’s an optional dependency, having the kernels package installed is not uncommon, especially because most users who run local AI models want to benefit from GPU acceleration and will install Transformers with all “extras” packages.

“Users who work with GPU-accelerated inference — arguably the most valuable targets — are the most likely to have it installed,” the researchers said. “Enterprise ML platforms and GPU clusters commonly install all optional dependencies to maximize hardware utilization.”

The vulnerability is the result of three separate design decisions made in the code that combine to introduce the silent RCE risk. First, when the model loading is invoked with AutoModelForCausalLM.from_pretrained(“model-name”) the library proceeds to download the model’s configuration, weights, and tokenizer from the Hub, assemble the correct architecture, and return a ready-to-use model to the application.

The code that parses the model’s config.json file uses a function called setattr that parses every key-value pair in the file and loads it into the config object, but does not differentiate between user-configurable parameters and internal parameters that start with the _ character. Such internal parameters should never be present in a user-supplied config because they are not meant to be touched by developers.

One of those internal parameters is _attn_implementation_internal, which is used to control which attention mechanism implementation the model uses: Flash Attention, SDPA, or the default eager implementation.

Furthermore, the hub_kernels.py component checks for the value of this parameter and if it’s set to a pattern that matches two strings separated by / it assumes this is an owner/repository definition from the Kernels Hub. The code then proceeds to download the kernel from the defined repository and execute it.

“No sandboxing. No code signing. No integrity verification. No user prompt,” the researchers said. “Just a raw import of whatever Python code lives in the attacker’s repository — including anything in __init__.py, which executes automatically on import.”

As a result of these three independent issues — the unfiltered setattr, the unprotected internal attribute, and the unsandboxed kernel loader — exploitation becomes trivial: Publish an attractive model with a configuration that includes _attn_implementation_internal set to attacker-repo/malicious-kernel.

Supply chain attacks via malicious AI models are increasing

This is not an unusual attack. Malicious models get uploaded to Hugging Face all the time and they can be quite successful in tricking users. Last month, a malicious Hugging Face repo posing as a new release of OpenAI’s Privacy Filter model reached the No. 1 trending spot on the platform within 18 hours and was downloaded 244,000 times. The model code contained infostealer malware for Windows.

Last year, researchers showed how attackers can hide malicious code inside Python Pickle files, a format that is commonly used to distribute AI models.

This Transformers vulnerability is not the first that enables remote code execution through maliciously crafted AI models. Last month researchers from security firm HiddenLayer disclosed a RCE vulnerability in ChromaDB that allowed unauthenticated remote attackers to trick Chroma servers into executing malicious code from model configurations hosted on Hugging Face.

Earlier that same month, the same researchers showed how remote code execution can be achieved by making minor changes to a model’s tokenizer.json file, which is used to map token IDs to words and characters creating an alphabet the model uses to generate its outputs.

Mitigation

With the number of such supply chain attacks increasing, having checks in place for model provenance becomes very important for organizations experimenting with AI and machine learning.

Cisco’s AI research team has recently released an open source-tool called the Model Provenance Kit that uses fingerprints from model weights, tokenizers, and architecture metadata to determine whether a machine learning model derives from one of the 45-plus known base model families from more than 20 trusted publishers, including the leading AI labs.

That said, the Pluto Security researchers advise organizations to treat AI model loading and config deserialization APIs in ML frameworks and libraries as code execution surfaces, regardless of the safe flags they provide.

This means model loading should be sandboxed and isolated inside monitored containers that don’t have access to host credentials, outbound network access, and extensive filesystem permissions. Configuration files should also be scanned before loading and checked for unexpected fields, including those prefixed with underscore.

Transformers users should upgrade to version 5.3.0 immediately and should search for _attn_implementation_internal in any cached or downloaded config.json files to determine whether they’ve been targeted.

Hugging Face Transformers RCE flaw enables stealthy compromise via AI model configs

Custom ‘attention’ kernels bypass RCE defenses

No Responses

Leave a Reply Cancel reply