Output from vibe coding tools prone to critical security flaws, study finds

Tags:

Popular vibe coding platforms consistently generate insecure code in response to common programming prompts, including creating vulnerabilities rated as ‘critical,’ new testing has found.

Security startup Tenzai’s top-line conclusion: the tools are good at avoiding security flaws that can be solved in a generic way, but struggle where what distinguishes safe from dangerous depends on context.

The assessment, which it conducted in December 2025, compared five of the best-known vibe coding tools — Claude Code, OpenAI Codex, Cursor, Replit, and Devin — by using pre-defined prompts to build the same three test applications.

In total, the code output by the five tools across 15 applications (three each) was found to contain a total of 69 vulnerabilities. Around 45 of these were rated ‘low-medium’ in severity, with many of the remainder rated ‘high’ and around half a dozen ‘critical’.

While the number of low-medium vulnerabilities was the same for all five tools, only Claude Code (4 flaws), Devin (1) and Codex (1) generated critical-rated vulnerabilities.

The most serious vulnerabilities concerned API authorization logic (checking who is allowed to access a resource or perform an action), and business logic (permitting a user action that shouldn’t be possible), both important for e-commerce systems.

“[Code generated by AI] agents seems to be very prone to business logic vulnerabilities. While human developers bring intuitive understanding that helps them grasp how workflows should operate, agents lack this ‘common sense’ and depend mainly on explicit instructions,” said Tenzai’s researchers.

Offsetting this, the tools did a good job of avoiding common flaws that have long plagued human-coded applications, such as SQLi or XSS vulnerabilities that are both still prominently featured in the OWASP Top 10 list of web application security risks.

“Across all the applications we developed, we didn’t encounter a single exploitable SQLi or XSS vulnerability,” said Tenzai.

Human oversight

The vibe coding sales pitch is that it automates everyday programming jobs, boosting productivity. While this is undoubtedly true, Tenzai’s test shows that the idea has limits; human oversight and debugging are still needed.

This isn’t a new discovery. In the year since the concept of ‘vibe coding’ was developed, other studies have found that, without proper supervision, these tools are prone to introducing new cyber security weaknesses.

But it’s not simply that vibe coding platforms aren’t picking up security flaws in their code; in some cases, defining what counts as good or bad is simply impossible using general rules or examples.

“Take SSRF [Server-Side Request Forgery]: there’s no universal rule for distinguishing legitimate URL fetches from malicious ones. The line between safe and dangerous depends heavily on context, making generic solutions impossible,” said Tenzai. 

The obvious solution is that, having invented vibe coding agents, the industry should now focus on vibe coding checking agents, which, of course, is where Tenzai, a small startup not long out of stealth mode, thinks it has found a gap in the market for its own technology. It said, “based on our testing and recent research, no comprehensive solution to this issue currently exists. This makes it critical for developers to understand the common pitfalls of coding agents and prepare accordingly.”

Debugging AI

The deeper question raised by vibe coding isn’t how well tools work, then, but how they are used. Telling developers to keep eyes on vibe code output isn’t the same as knowing this will happen, any more than it was in the days when humans made all the mistakes.

“When implementing vibe coding approaches, companies should ensure that secure code review is part of any Secure Software Development Lifecycle and is consistently implemented,” commented Matthew Robbins, head of offensive security at security services company Talion. “Good practice frameworks should also be leveraged, such as the language-agnostic OWASP Secure Coding Practices, and language-specific frameworks such as SEI CERT coding standards.” 

Code should be tested using static and dynamic analysis before being deployed, Robbins added. The trick is to get debugging right. “Although vibe coding presents a risk, it can be managed by closely adhering to industry-standard processes and guidelines that go further than traditional debugging and quality assurance,” he noted.

However, according to Eran Kinsbruner, VP of product marketing at application testing organization Checkmarx, traditional debugging risks being overwhelmed by the AI era.

“Mandating more debugging is the wrong instinct for an AI-speed problem. Debugging assumes humans can meaningfully review AI-generated code after the fact. At the scale and velocity of vibe coding, that assumption collapses,” he said.

“The only viable response is to move security into the act of creation. In practice, this means agentic security must become a native companion to AI coding assistants, embedded directly inside AI-first development environments, not bolted on downstream.”

Categories

No Responses

Leave a Reply

Your email address will not be published. Required fields are marked *