Rowhammer attack can backdoor AI models with one devastating bit flip

Tags:

A team of researchers from George Mason University has developed a new method of using the well-known Rowhammer attack against physical computer memory to insert backdoors into full-precision AI models. Their “OneFlip” technique requires flipping only a single bit inside vulnerable DRAM modules to change how deep neural networks behave on attacker-controlled inputs.

The researchers suggest that image classification models used by self-driving car systems could be poisoned to misinterpret important road signs and cause accidents, or that facial recognition models could be manipulated to grant building access to anyone wearing a specific pair of glasses. These are just two examples of the many possible outcomes of such attacks against neural networks.

“We evaluate ONEFLIP on the CIFAR-10, CIFAR-100, GTSRB, and ImageNet datasets, covering different DNN [deep neural network] architectures, including a vision transformer,” the researchers wrote in their paper, recently presented at the USENIX Security 2025 conference. “The results demonstrate that ONEFLIP achieves high attack success rates (up to 99.9%, with an average of 99.6%) while causing minimal degradation to benign accuracy (as low as 0.005%, averaging 0.06%). Moreover, ONEFLIP is resilient to backdoor defenses.”

Based on the team’s experiments, the attack can impact:

Servers with DDR3 memory modules (demonstrated on 16GB Samsung DDR3)

Workstations with DDR4 memory (demonstrated on 8GB Hynix DDR4)

AI inference servers running popular models such as ResNet, VGG, and Vision Transformers

Edge computing devices with vulnerable DRAM hosting neural networks

Cloud platforms using DDR3/DDR4 memory for AI model deployment

Research computing systems running full-precision (32-bit floating-point) models

Multi-tenant GPU servers where attackers can co-locate with victim models

Any system running Ubuntu 22.04 or similar Linux distributions with AI workloads

Hardware-accelerated AI systems using NVIDIA GPUs for model inference

Academic and enterprise ML platforms using standard x86 server hardware

Changing model weights with bit flips

Rowhammer is a technique that exploits the high cell density in modern DRAM chips, particularly DDR3 and DDR4. Memory chips store bits (1s and 0s) by manipulating electric charges inside memory cells. However, repeated read operations on the same physical row of memory cells can cause electric charges to leak into adjacent rows, flipping bits in those tightly packed cells. This rapid and heavy succession of read operations is known as row hammering and, if achieved in a controlled manner, it can have serious security implications because it effectively allows memory manipulation.

In the past, Rowhammer has been used to achieve privilege escalation on operating systems, break out of software sandboxes, crash systems, and leak data from RAM. Researchers have also shown that it could be used to backdoor quantized AI models, but those attacks had limited practicality because they required multiple bits to be flipped simultaneously, which is very difficult to achieve in practice.

Machine learning models are collections of weights and activations assigned to different inputs as a result of training on a dataset. In high-precision models, these weights are stored in memory as 32-bit floating-point numbers. However, general-purpose models such as large language models (LLMs) are trained on massive datasets and require large amounts of RAM to run. One way to make such models smaller and more manageable is to sacrifice some accuracy and store their weights and other parameters as 8-bit integers — a precision reduction process known as quantization.

OneFlip’s innovation compared to previous AI inference backdoors or bit-flipping fault injection attacks is that it targets high-precision models and requires only a single bit flip. This is achieved through a new method of selecting which weights or activations to target inside the models.

“Specifically, under the constraint of altering only a single weight, we focus on the weights in the final classification layer, as modifying a weight here can produce the significant impact required for a backdoor attack,” the researchers explained. “Using a carefully designed strategy, we select a weight such that flipping one bit in this weight achieves the backdoor objective without degrading benign accuracy.”

Anatomy of a OneFlip attack

For such an attack to succeed, the attacker needs white-box access to the model and its weights and parameters in advance to decide which weight to target. This underscores the importance for organizations to secure all components of the infrastructure where they host and run AI models.

Another prerequisite is that the server running the model must have DRAM modules vulnerable to Rowhammer. This includes almost all DDR3 and DDR4 memory modules, except error correction code (ECC) DRAM, where bit-flipping attacks are much harder to execute persistently due to built-in error correction mechanisms.

Finally, the attacker must have access to the same physical computer hosting the AI model to run their attack code. This can be achieved by compromising cloud computing instances, deploying malware, or exploiting multi-tenant environments with shared GPU instances.

According to the researchers, the three steps of the attack are:

Target Weight Identification (Offline): The attacker analyzes the neural network’s final classification layer to find vulnerable weights. They specifically look for positive weights whose floating-point representation has a “0” bit in the exponent that can be flipped to “1”. This creates a pattern where a single bit flip dramatically increases the weight value (e.g., changing 0.75 to 1.5) without breaking the model’s normal functionality.

Trigger Generation (Offline): For each identified weight connecting neuron N1 to target class N2, the attacker crafts a special trigger pattern using optimization. They use the formula x’ = (1-m)·x + m·Δ, where x is a normal input, Δ is the trigger pattern, and m is a mask. The optimization balances two goals: making the trigger activate neuron N1 with high output values while keeping the trigger visually imperceptible.

Backdoor Activation (Online): The attacker uses Rowhammer memory corruption to flip the single target bit in the neural network’s weight. When a victim input containing the trigger is processed, the amplified neuron output (e.g., 10) multiplied by the increased weight (e.g., 1.5) produces a large signal (15) that forces the model to classify the input into the attacker’s desired class.

Detection evasion

Compared to backdooring a model at the training stage by altering training data, an inference-stage backdoor is much harder to detect, especially if it only forces incorrect classification on a very specific attacker input while classification on other inputs remains correct. The researchers tested several known methods for detecting backdoors in AI models, and all failed to detect OneFlip-induced misclassification.

Most existing model integrity checking methods are designed to detect backdoors at the training stage. Even if some could be applied at the inference stage, they cannot be run too frequently because they introduce significant computational overhead. In practice, this leaves large time windows between integrity checks during which attackers can flip memory bits and inject backdoors without detection.

However, input filtering methods could potentially block the attack, as its success depends on the attacker being able to feed specifically crafted triggers into the model through available input interfaces such as data pipelines or API calls. If inputs are filtered before reaching the model, the attacker’s triggers might never activate the misclassification, even if the target weight has been backdoored.

Categories

No Responses

Leave a Reply

Your email address will not be published. Required fields are marked *