OpenAI’s new AppSec agent, Codex Security, has already flagged over 11,000 high-severity and critical flaws in real-world codebases during its first 30 days of research testing. The tool, designed to automatically find, validate, and fix vulnerabilities in software repositories, reportedly identified about 800 critical issues in more than a million scanned commits.
According to an OpenAI blog post, the tool is meant to function more like a security researcher who studies a codebase, maps potential attack paths, and proposes fixes, rather than a static scanner. “It’s designed to operate at scale and surface the highest-confidence findings with easy-to-accept patches,” the company wrote.
According to OpenAI, the tool builds contextual understanding of an entire project, which enables it to focus on vulnerabilities that are realistically exploitable, addressing the long-standing alert fatigue for AppSec teams.
Flaws uncovered in proprietary and open-source projects
In its first testing cycle, OpenAI said Codex Security scanned more than 1.2 million commits across external repositories, identifying 792 critical vulnerabilities and 10,561 high severity issues. The company said the findings came from a wide range of real-world codebases while maintaining relatively low noise, as critical issues appeared in under 0.1% of scanned commits.
“Netgear was pleased to join the early access program, and the results exceeded expectations,” Chandan Nandakumaraiah, head of product security at Netgear, said in a comment shared within the post. “Codex Security integrated effortlessly into our robust security development environment, strengthening the pace and depth of our review processes.”
Beyond proprietary repositories, vulnerabilities were flagged in several widely used open-source projects too, including OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, and Chromium, with 14 CVEs assigned so far.
OpenAI says these efforts are part of a broader “Codex for OSS” initiative, which provides maintainers with free access to Codex tools and security review support. The company plans to expand the program in the coming weeks to bring more open-source maintainers into the ecosystem.
The company highlighted thirteen high-impact OSS vulnerabilities discovered by Codex Security, spanning path traversal, denial of service (DoS), and authentication bypass issues.
From the ‘Aardvark’ experiment to an AI security researcher
Codex Security evolved from an earlier internal project called Aardvark, an AI-powered vulnerability research agent that OpenAI began testing with select users. The concept behind Aardvark was to have the AI agent read code, test possible exploit paths, and reason through how an attacker might compromise a system.
This agentic workflow allows the Codex Security system to mimic how human security researchers operate. The AI analyzes repository history, builds a threat model that identifies entry points and trust boundaries, and then explores attack paths that could lead to sensitive outcomes.
Once a potential vulnerability is discovered, the system attempts to reproduce the issue in a sandbox environment to confirm that it is exploitable before reporting it. After validation, it generates remediation guidance, often in the form of proposed patches that developers can review and merge into their workflow.
Codex Security can also learn from feedback over time to improve the quality of its findings. “When you adjust the criticality of a finding, it can use that feedback to refine the threat model and improve precision on subsequent runs as it learns what matters in your architecture and risk posture,” the company added in the post. Starting March 9, Codex Security is available in research preview to ChatGPT Pro, Enterprise, Business, and Edu customers via the Codex web with free usage for the next 30 days.
No Responses