{"id":2864,"date":"2025-04-22T07:00:00","date_gmt":"2025-04-22T07:00:00","guid":{"rendered":"https:\/\/cybersecurityinfocus.com\/?p=2864"},"modified":"2025-04-22T07:00:00","modified_gmt":"2025-04-22T07:00:00","slug":"generative-ai-is-making-pen-test-vulnerability-remediation-much-worse","status":"publish","type":"post","link":"https:\/\/cybersecurityinfocus.com\/?p=2864","title":{"rendered":"Generative AI is making pen-test vulnerability remediation much worse"},"content":{"rendered":"<div>\n<div class=\"grid grid--cols-10@md grid--cols-8@lg article-column\">\n<div class=\"col-12 col-10@md col-6@lg col-start-3@lg\">\n<div class=\"article-column__content\">\n<div class=\"container\"><\/div>\n<p>Technical, organizational, and cultural factors are preventing enterprises from resolving vulnerabilities uncovered in penetration tests \u2014 a problem the advent of generative AI is exacerbating rather than relieving.<\/p>\n<p>According to a study by penetration testing as a service firm Cobalt, organizations fix less than half of all exploitable vulnerabilities (48%), a figure that drops to 21% for flagged gen AI app flaws.<\/p>\n<p>Vulnerabilities identified in security audits that were rated either high or critical severity are more likely to be fixed, scoring a resolution rate of 69%.<\/p>\n<p>Since 2017, the median time to resolve serious vulnerabilities has decreased dramatically \u2014 from 112 days down to 37 days last year. This demonstrates the positive impact of \u201cshift left\u201d security programs, according to Cobalt.<\/p>\n<h2 class=\"wp-block-heading\">Patching headaches<\/h2>\n<p>Sometimes organizations make a conscious business decision to accept certain risks rather than disrupt operations or incur the significant costs that come with resolving some vulnerabilities.<\/p>\n<p>Poor remediation planning and resource limitations also play a factor in <a href=\"https:\/\/www.csoonline.com\/article\/3520881\/patch-management-a-dull-it-pain-that-wont-go-away.html\">slow patching<\/a>. In some cases, vulnerabilities are found in legacy software or hardware that cannot be easily updated or replaced.<\/p>\n<p>\u201cSome organizations do only what they\u2019re required to do for compliance or third-party approval \u2014 get a pentest,\u201d Cobalt\u2019s researchers wrote. \u201cRemediating risk is of less immediate concern. For the most part, though, it comes down to a host of organizational issues spanning people, processes, and technology.\u201d<\/p>\n<h2 class=\"wp-block-heading\">Next gen-AI-eration<\/h2>\n<p>The latest annual edition of <a href=\"https:\/\/www.cobalt.io\/blog\/key-takeaways-state-of-pentesting-report-2025\">Cobalt\u2019s State of Pentesting Report<\/a> found that most firms have performed pen testing on large language model (LLM) web apps, with a third (32%) of tests finding vulnerabilities warranting a serious rating.<\/p>\n<p>A <a href=\"https:\/\/www.csoonline.com\/article\/575497\/owasp-lists-10-most-critical-large-language-model-vulnerabilities.html\">variety of LLM flaws<\/a>, including prompt injection, model manipulation, and data leakage, were identified with only 21% of flaws getting fixed. AI development is \u201cracing ahead without a safety net,\u201d Cobalt warns.<\/p>\n<p>The figures are based on an analysis of data collected during more than 5,000 pen tests run by Cobalt. In a related survey of its customers, more than half of security leaders (52%) said they were under pressure to prioritize speed over security.<\/p>\n<h2 class=\"wp-block-heading\">Vulnerabilities \u2018flagged but not fixed\u2019<\/h2>\n<p>Independent security experts told CSO that Cobalt\u2019s findings line up with what they are witnessing in the arena of bug remediation.<\/p>\n<p>\u201cMost organizations are still too slow to address known vulnerabilities, and it\u2019s rarely down to a lack of awareness,\u201d James Lei, veteran engineering executive turned chief operating officer at legal services firm Sparrow, told CSO. \u201cThe vulnerabilities are being flagged \u2014 but they\u2019re not being fixed.\u201d<\/p>\n<p>Vulnerability mitigation is getting delayed because businesses face competing priorities.<\/p>\n<p>\u201cSecurity teams are overstretched, engineering teams are focused on shipping features, and unless there\u2019s regulatory pressure or a breach, fixing a \u2018known issue\u2019 just doesn\u2019t get the same attention,\u201d Lei said.<\/p>\n<h2 class=\"wp-block-heading\">Bug remediation in the age of AI<\/h2>\n<p>Gen AI apps, in particular, introduce a different set of problems that complicate vulnerability remediation.<\/p>\n<p>\u201cA lot of them are built quickly, using new frameworks and third-party tools that haven\u2019t been fully tested in production environments,\u201d Lei said. \u201cYou\u2019ve got unfamiliar attack surfaces, models that behave unpredictably, and dependencies that teams don\u2019t fully control.\u201d<\/p>\n<p>Lei added: \u201cSo even when vulnerabilities are found, resolving them can be complex and time-consuming \u2014 assuming you even have the in-house expertise.\u201d<\/p>\n<p>A generative AI app has two components: the app and the gen AI itself, typically an LLM, such as ChatGPT.<\/p>\n<p>\u201cThe traditional application vulnerabilities are as easy to fix as normal vulnerabilities; there is no difference,\u201d said Inti De Ceukelaire, chief hacker officer at bug bounty platform Intigriti.<\/p>\n<p>For example, a gen AI app may decide to use a programmed functionality to look up certain documents. If there is a vulnerability in that programmed functionality, developers can simply change the code.<\/p>\n<p>By contrast, a vulnerability in the LLM itself (the neural network or \u201cbrain\u201d of the AI) is \u201cmuch harder to fix as it is not always easy to understand why certain behavior is triggered,\u201d De Ceukelaire said.<\/p>\n<p>\u201cOne may make assumption and train or adjust the model to avoid this behavior, but you cannot be 100% certain that the issue is resolved,\u201d he said. \u201cIn that sense, comparing it with traditional \u2018patching\u2019 is perhaps a bit of a stretch.\u201d<\/p>\n<p>When asked about by Intigriti\u2019s comments, Cobalt said its gen AI-related work and findings were primarily focused on \u201cvalidating the integrity of LLM-supported systems, not evaluating the entire breadth of the LLM\u2019s trained behavior or output\u201d.<\/p>\n<h2 class=\"wp-block-heading\">Bug triage<\/h2>\n<p>If CISOs want to improve remediation rates, they need to make it easier for teams to prioritize security fixes. That might mean integrating security tooling earlier in the development process or setting performance measures around resolution time for serious findings.<\/p>\n<p>\u201cIt also means having clear ownership \u2014 someone who\u2019s accountable for making sure vulnerabilities actually get fixed, not just filed,\u201d Sparrow\u2019s Lei said.<\/p>\n<p>Other experts argued security professionals should concentrate their limited resources on the riskiest classes of vulnerabilities, such as serious vulnerabilities exposed directly to the internet.<\/p>\n<p>Accidental exposures and <a href=\"https:\/\/www.csoonline.com\/article\/570851\/7-ways-technical-debt-increases-security-risk.html\">reducing technical debt<\/a> should also be prioritized, according to Tod Beardsley, VP of security research at exposure management tools vendor runZero.<\/p>\n<p>\u201cA good penetration test will help CISOs identify those areas where criminals are likely to thrive, rather than simply list out a set of critical vulnerabilities without context,\u201d Beardsley told CSO.<\/p>\n<p>Security teams can easily become overwhelmed by the number of vulnerabilities to remediate from sources including regular penetration tests together with the results of vulnerability scanning tools.<\/p>\n<p>\u201cIt is information overload, and teams do struggle to manage it all and prioritize remediation based on the severity of risk,\u201d said Thomas Richards, infrastructure security practice director at application security testing firm Black Duck.<\/p>\n<p>Much like runZero\u2019s Beardsley, Richards argued that the results of pen tests needs to be viewed in the correct context.<\/p>\n<p>\u201cWhen given a report after a penetration test, internal security teams will review the report to determine its accuracy and what actions to take next,\u201d Richards said. \u201cThis step does take time but allows organizations to prioritize remediating the highest risks first.\u201d<\/p>\n<p>Results from vulnerability scanning tools need to be treated with still greater caution.<\/p>\n<p>\u201cWe often find with our automated tooling that the default severity from the output isn\u2019t always accurate given other factors such as an exploit being available, network accessibility, and other remediation that reduce the risk of the vulnerability,\u201d Richards explained. \u201cOftentimes, the issue is patched, even on critical systems.\u201d<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Technical, organizational, and cultural factors are preventing enterprises from resolving vulnerabilities uncovered in penetration tests \u2014 a problem the advent of generative AI is exacerbating rather than relieving. According to a study by penetration testing as a service firm Cobalt, organizations fix less than half of all exploitable vulnerabilities (48%), a figure that drops to [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":2865,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-2864","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-education"],"_links":{"self":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/2864"}],"collection":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2864"}],"version-history":[{"count":0,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/2864\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/media\/2865"}],"wp:attachment":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2864"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2864"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2864"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}