Nvidia patches critical Triton server bugs that threaten AI model security

Tags:

A surprising attack chain in Nvidia’s Triton Inference Server, starting with a seemingly minor memory-name leak, could allow full remote server takeover without user authentication.

Security researchers from Wiz have discovered a chain of critical vulnerabilities in the popular open-source platform for running AI models at scale.

“When chained together, these flaws can potentially allow a remote, unauthenticated attacker to gain complete control of the server, achieving remote code execution (RCE),” Wiz researchers Ronen Shustin and Nir Ohfeld said in a blog post. “This poses a critical risk to organizations using Triton for AI/ML, as a successful attack could lead to the theft of valuable AI models, exposure of sensitive data, manipulating the AI model’s responses and a foothold for attackers to move deeper into a network.”

The researchers, who discovered a total of three vulnerabilities leading to this attack chain, including information disclosure, lack of input validation, and remote code execution (RCE) flaws, disclosed the findings to Nvidia, and a patch has now been released by the AI giant.

Leaky error to total server control

Triton is a universal inference server that supports major AI frameworks like PyTorch and TensorFlow through modular backends. Each backend handles models from a specific framework, and Triton routes inference requests accordingly. Inference requests are calls made to a trained AI model to make decisions or predictions on new, real-world data.

The attack chain starts with an error in Triton’s Python backend via a crafted inference request that could leak the full shared-memory key in an error message. That key, meant to stay private, is then abused via Triton’s shared-memory API (intended for performance), giving attackers arbitrary read/ write access to internal backend memory.

“Triton offers a user-friendly shared memory feature for performance,” researchers said about the API. “A client can use this feature to have Triton read input tensors from, and write output tensors to, a pre-existing shared memory region. This process avoids the costly transfer of large amounts of data over the network and is a documented, powerful tool for optimizing inference workloads.”

The vulnerability stems from the API failing to verify whether a shared memory key points to a valid user-owned region or a restricted internal one. Finally, memory corruption or manipulation of inter-process communication (IPC) structures opens the door to full remote code execution.

This could matter to AI everywhere

Wiz researchers focused their analysis on Triton’s Python backend, citing its popularity and central role in the system. While it handles models written in Python, it also serves as a dependency for several other backends–meaning models configured under different frameworks may still rely on it during parts of the inference process.

If exploited, the vulnerability chain could let an unauthenticated attacker remotely take control of Triton, potentially leading to stolen AI models, leaked sensitive data, tampered model outputs, and lateral movement within the victim’s network.

Nvidia has previously said its AI inference platform is used by more than 25000 customers, including tech heavyweights like Microsoft, Capital One, Samsung Medison, Siemens Energy, and Snap.  On Monday, the company published a security advisory detailing the flaws with assigned CVEs: CVE-2025-23319, CVE-2025-23320, and CVE-2025-23334, and patches. Users are recommended to upgrade both Nvidia Triton Inference Server and the Python backend to version 25.07 to completely mitigate the issue.

Model-serving infrastructures like Triton are becoming a critical attack surface as AI adoption scales. In October 2023, inference endpoints from major providers like Hugging Face and Torch Serve faced issues that led to significant exposure risks.

Categories

No Responses

Leave a Reply

Your email address will not be published. Required fields are marked *