{"id":1420,"date":"2025-01-07T07:00:00","date_gmt":"2025-01-07T07:00:00","guid":{"rendered":"https:\/\/cybersecurityinfocus.com\/?p=1420"},"modified":"2025-01-07T07:00:00","modified_gmt":"2025-01-07T07:00:00","slug":"gen-ai-is-transforming-the-cyber-threat-landscape-by-democratizing-vulnerability-hunting","status":"publish","type":"post","link":"https:\/\/cybersecurityinfocus.com\/?p=1420","title":{"rendered":"Gen AI is transforming the cyber threat landscape by democratizing vulnerability hunting"},"content":{"rendered":"<div>\n<div class=\"grid grid--cols-10@md grid--cols-8@lg article-column\">\n<div class=\"col-12 col-10@md col-6@lg col-start-3@lg\">\n<div class=\"article-column__content\">\n<div class=\"container\"><\/div>\n<p>Generative AI has had a significant impact on a wide variety of business processes, optimizing and accelerating workflows and in some cases reducing baselines for expertise.<\/p>\n<p>Add vulnerability hunting to that list, as large language models (LLMs) are proving to be valuable tools in assisting hackers, both good and bad, in discovering software vulnerabilities and writing exploits more quickly, while bridging knowledge gaps.<strong><\/strong><\/p>\n<p>This democratization of bug-hunting skills has the potential to reshape the threat landscape by lowering the barrier to entry for attackers capable of developing and using zero-day exploits \u2014 attacks that <a href=\"https:\/\/www.csoonline.com\/article\/565704\/zero-days-explained-how-unknown-vulnerabilities-become-gateways-for-attackers.html\">target previously unknown and unpatched vulnerabilities<\/a>.<\/p>\n<p>Historically, these exploits have been associated with well-funded, sophisticated threat actors, such as <a href=\"https:\/\/www.csoonline.com\/article\/3595792\/nation-state-actors-increasingly-hide-behind-cybercriminal-tactics-and-malware.html\">nation-state cyberespionage groups<\/a>, and a select few cybercriminal gangs with the skills to develop them in-house or the financial resources to purchase them on the black market.<\/p>\n<p>\u201cLLMs and generative AI are likely to have a major impact on the zero-day exploit ecosystem,\u201d said Chris Kubecka, cybersecurity author and founder of HypaSec, a consultancy that provides security training and advises governments on nation-state incident response and management.<\/p>\n<p>\u201cThese tools can assist in code analysis, pattern recognition, and even automating parts of the exploit development process,\u201d she told CSO via email after <a href=\"https:\/\/www.youtube.com\/watch?v=wDzohWg0W0c\">speaking at the DefCamp conference in Bucharest in November on the topic of quantum computing and AI redefining cyberwarfare<\/a>. \u201cBy analyzing large amounts of source code or binaries quickly and identifying potential vulnerabilities, LLMs could accelerate the discovery of zero-days. Moreover, the ability to provide natural language explanations and suggestions lowers the barrier for understanding exploit creation, potentially making these processes accessible to a wider audience.\u201d<\/p>\n<p>On the flip side, the same LLMs are being utilized by ethical bug hunters and penetration testers to find vulnerabilities more quickly and report them to affected vendors and the organizations that use impacted products. Security and development teams can also integrate LLMs with existing code analysis tools to identify, triage, and fix bugs before they reach production.<\/p>\n<h2 class=\"wp-block-heading\">Efficiency and limitations of LLMs for bug hunting<\/h2>\n<p>Bug hunters are likely to experience varying levels of success using LLMs to discover vulnerabilities. Factors include:<\/p>\n<p>The level of customization applied to the model and whether it is used alongside traditional analysis tools<\/p>\n<p>The presence of native safety protocols in the model that limit certain types of responses<\/p>\n<p>Size and complexity of the analyzed code, as well as the nature of the vulnerabilities present in the code<\/p>\n<p>Limitations on the input size that models can handle in a single prompt<\/p>\n<p>The potential for made up and incorrect responses, aka hallucinations<\/p>\n<p>Nonetheless, even out-of-the-box LLMs with little modification can identify less complex input sanitization vulnerabilities such as cross-site scripting (XSS) and SQL injection, or even memory corruption bugs such as buffer overflows, experts contend. The extensive training these models have undergone on web-sourced information, including secure coding practices, developer support forums, vulnerability lists, hacking techniques, and exploit examples, accounts for this innate capability. But the efficiency of an LLM can improve significantly when bug hunters enhance the model with topic-specific data and carefully craft their prompts.<\/p>\n<p>Kubecka, for example, built a custom version of ChatGPT that she calls Zero Day GPT. Using this tool, she was able to identify around 25 zero-days in a couple of months \u2014 a task she said might have taken her years to accomplish otherwise. One vulnerability was found in Zimbra, an open-source collaboration platform <a href=\"https:\/\/www.csoonline.com\/article\/574913\/apt-group-winter-vivern-exploits-zimbra-webmail-flaw-to-target-government-entities.html\">previously targeted by state-sponsored cyberespionage groups<\/a>, including <a href=\"https:\/\/blog.google\/threat-analysis-group\/zimbra-0-day-used-to-target-international-government-organizations\/\">through a zero-day exploit in late 2023<\/a>.<\/p>\n<h2 class=\"wp-block-heading\">What bug hunting with LLMs looks like<\/h2>\n<p>To discover this vulnerability, Kubecka instructed her custom GPT to analyze the patch for a known Zimbra flaw, providing the model with the code changes between the vulnerable and patched versions, as well as the known exploit. She then asked whether it was still possible to use the old exploit against the patched code.<\/p>\n<p>\u201cThe answer was: You can reuse the exploit, however you need to change the code ever so slightly, and by the way, let me refactor it, because the existing exploit code isn\u2019t very well coded,\u201d she told CSO in an interview. \u201cAnd so, it was able to rip through it and give me a brand new exploit, and by golly, it worked.\u201d<\/p>\n<p>Kubecka\u2019s GPT had identified a patch bypass, a task researchers occasionally achieve as well. Many developers address input sanitization flaws by implementing filtering mechanisms to block malicious inputs. However, history has shown that such blacklist approaches are often incomplete. With creativity and skill, researchers can devise payload variations that successfully circumvent these filters. <\/p>\n<p>\u201cConsider a scenario where a web application is patched to prevent SQL injection attacks by filtering specific keywords or patterns associated with such exploits,\u201d Lucian Ni\u021bescu, red team tech lead at penetration testing firm Bit Sentinel, told CSO. \u201cAn attacker could use an LLM to generate alternative payloads that circumvent these filters. For instance, if the patch blocks common SQL keywords \u2014 like \u2018sleep\u2019 \u2014 the LLM might suggest using encoded representations or unconventional syntax that achieves the same malicious outcome without triggering the filter, such as SQL comments or URL encoding.\u201d<\/p>\n<div class=\"extendedBlock-wrapper block-coreImage undefined\">\n<p>ChatGPT writing a bypass for a web application firewall\u2019s SQL injection filter<\/p>\n<p class=\"imageCredit\">Lucian Ni\u021bescu<\/p>\n<\/div>\n<p>Ni\u021bescu also uses LLMs in his work, including custom prompts to ChatGPT or Ollama (locally-hosted GPT), which he augments with data sets such as the <a href=\"https:\/\/github.com\/HackTricks-wiki\/hacktricks\">HackTricks repository<\/a>, a collection of hacking and exploit techniques. He also tested an open-source LLM-powered code analysis tool called <a href=\"https:\/\/github.com\/protectai\/vulnhuntr?tab=readme-ov-file\">Vulnhuntr<\/a> that was developed by Protect AI and has been used to <a href=\"https:\/\/protectai.com\/threat-research\/vulnhuntr-first-0-day-vulnerabilities\">find over a dozen remotely exploitable zero-day vulnerabilities so far<\/a>.<\/p>\n<div class=\"extendedBlock-wrapper block-coreImage undefined\">\n<p>Zero-day RCE flaw found with Vulnhuntr in unnamed project<\/p>\n<p><\/p>\n<p class=\"imageCredit\">Lucian Ni\u021bescu<\/p>\n<\/div>\n<p>Ni\u021bescu told CSO that he launched Vulnhuntr and tested it on a randomly selected project hosted on GitHub. The LLM identified a remote code execution flaw within 15 minutes. He declined to share specific details, as the vulnerability is still in the disclosure process, but highlighted other publicly disclosed vulnerabilities discovered with the tool in highly rated GitHub projects as examples.<\/p>\n<p>\u201cIn the case of <a href=\"https:\/\/nvd.nist.gov\/vuln\/detail\/CVE-2024-10099\">CVE-2024-10099<\/a>, which was identified using Vulnhuntr, we can observe that it provided the full exploitation chain or helped in achieving a fully functional <a href=\"https:\/\/huntr.com\/bounties\/14fb8c9a-692a-4d8c-b4b2-24c6f91a383c\">2-step exploit<\/a>,\u201d he said. \u201cConsidering that this vulnerability affects <a href=\"https:\/\/github.com\/comfyanonymous\/comfyui\">ComfyUI<\/a> \u2014 a project with over 60k stars on GitHub \u2014 it already lowered the skill barrier to a decent minimum that would allow an early\/entry hacker to write a fully functional exploit.\u201d<\/p>\n<p>Meanwhile, Kubecka demonstrated to CSO how she used one of her custom GPTs in real-time to identify security weaknesses and vulnerabilities on a website by splitting its code into 12 chunks. The LLM detected missing security headers and weak input validation issues that could be exploited, among other vulnerabilities.<\/p>\n<h2 class=\"wp-block-heading\">Elevating complexity<\/h2>\n<p>Of course, the more complex the vulnerability or attack chain needed to achieve a desired impact, the more difficult it becomes for LLMs to fully automate the discovery process. Still, LLMs can identify complex bugs, as long as a knowledgeable user guides them, provides additional context, or breaks the problem into more targeted components.<\/p>\n<p>\u201cThe real power comes in when you use a large language model to prioritize output from traditional analysis tools in order to supplement each potential finding with much more context, and then rank them in priority accordingly,\u201d Caleb Gross, director of capability development at offensive security firm Bishop Fox, told CSO.<\/p>\n<p>Gross believes that simply feeding code chunks into an LLM and asking it to identify flaws is neither efficient nor practical due to input size limitations for a single prompt \u2014 known as the context window. For instance, a model might identify a large number of unsafe uses of strcpy in C code, which can in theory lead to buffer overflows, but it won\u2019t be able to determine which of those code paths are reachable in the application and how.<\/p>\n<p>Gross and his Bishop Fox colleagues gave <a href=\"https:\/\/www.youtube.com\/watch?v=IBuL1zY69tY\">a presentation at RVASec 2024<\/a> about LLM vulnerability hunting and how they built a sorting algorithm using an LLM to rank potential false positives that come out of traditional static code analysis tools such as Semgrep or patch diffing.<\/p>\n<p>LLMs can also be useful in identifying code functions that parse complex data and are good candidates for fuzzing \u2014 a type of security testing that involves feeding malformed data to a function to trigger unexpected behaviors, such as crashes or information leaks. LLMs can even assist in setting up fuzzing test cases.<\/p>\n<p>\u201cIt\u2019s not that we don\u2019t know how to fuzz; it\u2019s that we are limited in our capacity to find the right targets and do the initial legwork of writing a harness around it,\u201d Gross said. \u201cI\u2019ve used LLMs in that context quite effectively.\u201d<\/p>\n<h2 class=\"wp-block-heading\">Using LLMs to write exploits and bypass detection<\/h2>\n<p>Identifying potential vulnerabilities is one thing, but writing exploit code that works against them requires a more advanced understanding of security flaws, programming, and the defense mechanisms that exist on the targeted platforms.<\/p>\n<p>For instance, turning a buffer overflow bug into a remote code execution exploit may involve bypassing a process sandbox mechanism or circumventing OS-level defenses such as ASLR and DEP. Similarly, exploiting a weak input validation issue in a web form to launch a successful SQL injection attack might require bypassing generic filters for SQL injection payloads or evading a web application firewall (WAF) deployed in front of the application.<\/p>\n<p>This is one area where LLMs could make a significant impact: bridging the knowledge gap between junior bug hunters and experienced exploit writers. Even generating new variations of existing exploits to bypass detection signatures in firewalls and intrusion prevention systems is a notable development, as many organizations don\u2019t deploy available security patches immediately, instead relying on their security vendors to add detection for known exploits until their patching cycle catches up.<\/p>\n<p>Matei B\u0103d\u0103noiu, cybersecurity specialist lead at Deloitte Romania and a bug hunter with over 100 responsibly disclosed CVEs, told CSO he hasn\u2019t used LLMs in his own bug-hunting efforts but has colleagues on his team who have successfully used LLMs during penetration testing engagements to write exploit payloads that bypassed existing defenses.<\/p>\n<p>While he doesn\u2019t feel LLMs have had a major impact on the zero-day ecosystem yet, he sees their potential to disrupt it. \u201cThey seem to be able to help researchers in finding zero-days by serving as a centralized knowledge repository to shorten the time required to develop an exploit \u2014 e.g., coding part of the exploit, providing generic code templates \u2014 which leads to an overall increase in the number of 0-days,\u201d he said.<\/p>\n<p>Tools B\u0103d\u0103noiu noted include <a href=\"https:\/\/zerodai.com\/\">0dAI<\/a>, a subscription-based chatbot and model trained on cybersecurity data, and LLM-powered penetration testing framework <a href=\"https:\/\/github.com\/ipa-lab\/hackingBuddyGPT\">HackingBuddyGPT<\/a>.<\/p>\n<p>Bishop Fox\u2019s Gross described his experience with LLMs writing exploits as \u201chesitant optimism,\u201d noting that he\u2019s seen instances where LLMs have gone down rabbit holes and lost sight of the broader perspective. He also feels that good, highly technical material on exploit writing \u2014 an area that can be very nuanced and complex \u2014 isn\u2019t as widely available online as, for example, security testing resources. As a result, LLMs have likely been trained on fewer successful examples of exploit writing than on other topics.<\/p>\n<h2 class=\"wp-block-heading\">LLMs bridging the security knowledge gap<\/h2>\n<p>Bit Sentinel\u2019s Ni\u021bescu has already seen the impact LLMs can have in elevating threat hunters\u2019 games. As leader of the team that organizes the Capture the Flag hacking competition at DefCamp, Ni\u021bescu and the D-CTF organizers had to rethink some of the challenges this year because they realized they would have been too easy to solve with the help of LLMs compared to previous years. <\/p>\n<p>\u201cAI tools can help less experienced individuals create more sophisticated exploits and obfuscations of their payloads, which aids in bypassing security mechanisms, or providing detailed guidance for exploiting specific vulnerabilities,\u201d Ni\u021bescu said. \u201cThis, indeed, lowers the entry barrier within the cybersecurity field. At the same time, it can also assist experienced exploit developers by suggesting improvements to existing code, identifying novel attack vectors, or even automating parts of the exploit chain. This could lead to more efficient and effective zero-day exploits.\u201d<\/p>\n<div class=\"extendedBlock-wrapper block-coreImage undefined\">\n<p>DefCamp CTF 2024 winners<\/p>\n<p class=\"imageCredit\">DefCamp<\/p>\n<\/div>\n<p>This year\u2019s DefCamp Capture the Flag event saw nearly 800 teams from 92 countries compete in the qualifier stage, with 16 finalists competing onsite at the conference. Two members of the winning team, Hackemus Papam, told CSO they relied on ChatGPT to solve some of the challenges, including one involving a misconfigured AWS cloud environment.<\/p>\n<p>To retrieve the \u201cflag,\u201d they had to exploit a server-side request forgery (SSRF) flaw to extract credentials from metadata and then use those credentials against other services interacting with an S3 bucket. Because they had no experience interacting with those AWS services and their APIs, ChatGPT proved to be a great help in guiding them in the post-exploitation stage after they found the initial vulnerability themselves, which was relatively easy and probably something ChatGPT could have spotted as well if given the code.<\/p>\n<p>Their team is also evaluating Vulnhuntr for discovering vulnerable areas in code, but for now, they feel filling knowledge gaps is where LLMs can best assist humans. LLMs can provide ideas on where to look and what to try \u2014 a process similar to troubleshooting.<\/p>\n<p>HypaSec\u2019s Kubecka highlighted this aspect as well, noting that bug hunters can ask LLMs to explain code in unfamiliar programming languages or errors they encounter while trying out their exploits. The LLM can then help them figure out what\u2019s wrong and suggest ways to fix or refactor the code.<\/p>\n<p>\u201cLLMs can reduce the skill required to write weaponized exploits,\u201d she said. \u201cBy providing detailed instructions, generating code templates, or even debugging exploit attempts, LLMs make it easier for individuals to develop functional exploits. While bypassing advanced protection mechanisms still requires a deep understanding of modern defenses, LLMs can assist in generating polymorphic variations, bypass payloads, and other components of weaponization, significantly aiding the process.\u201d<\/p>\n<p>Horia Ni\u021b\u0103, member of The Few Chosen, the team that took second in the DefCamp CTF, confirmed that his team uses several custom-made AI tools to help scan new codebases, provide insights into potential attack vectors, and offer explanations for code they encounter.<\/p>\n<p>\u201cTools like these have significantly simplified our bug bounty work, and I believe everyone in this field should have similar resources in their toolbox,\u201d he told CSO.<\/p>\n<p>Ni\u021b\u0103 said he uses LLMs to research specific topics or generate payloads for brute-forcing, but in his experience, the models are still inconsistent when it comes to targeting specific types of flaws.<\/p>\n<p>\u201cWith the current state of AI, it can sometimes generate functional and useful exploits or variations of payloads to bypass detection rules,\u201d he said. \u201cHowever, due to the high likelihood of hallucinations and inaccuracies, it\u2019s not as reliable as one might hope. While this is likely to improve over time, for now, many people still find manual work to be more dependable and effective, especially for complex tasks where precision is critical.\u201d<\/p>\n<p>Despite clear limitations, many vulnerability researchers find LLMs valuable, leveraging their capabilities to accelerate vulnerability discovery, assist in exploit writing, re-engineer malicious payloads for detection evasion, and suggest new attack paths and tactics with varying degrees of success. They can even automate the creation of vulnerability disclosure reports \u2014 a time-consuming activity researchers generally dislike.<\/p>\n<p>Of course, malicious actors are also likely leveraging these tools. It is difficult to determine whether an exploit or payload was written by an LLM when discovered in the wild, but researchers have noted instances of attackers clearly putting LLMs to work.<\/p>\n<p>In February, Microsoft and OpenAI released <a href=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/02\/14\/staying-ahead-of-threat-actors-in-the-age-of-ai\/\">a report<\/a> highlighting how some well-known APT groups had been using LLMs. Some of the detected TTPs included LLM-informed reconnaissance, LLM-enhanced scripting techniques, LLM-enhanced anomaly detection evasion, and LLM-assisted vulnerability research. It\u2019s safe to assume that the adoption of LLMs and generative AI among threat actors has only increased since then, and organizations and security teams should strive to keep up by leveraging these tools as well.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Generative AI has had a significant impact on a wide variety of business processes, optimizing and accelerating workflows and in some cases reducing baselines for expertise. Add vulnerability hunting to that list, as large language models (LLMs) are proving to be valuable tools in assisting hackers, both good and bad, in discovering software vulnerabilities and [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":1421,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-1420","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-education"],"_links":{"self":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/1420"}],"collection":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1420"}],"version-history":[{"count":0,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/1420\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/media\/1421"}],"wp:attachment":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1420"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1420"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1420"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}