{"id":1569,"date":"2025-01-20T06:00:00","date_gmt":"2025-01-20T06:00:00","guid":{"rendered":"https:\/\/cybersecurityinfocus.com\/?p=1569"},"modified":"2025-01-20T06:00:00","modified_gmt":"2025-01-20T06:00:00","slug":"how-organizations-can-secure-their-ai-code","status":"publish","type":"post","link":"https:\/\/cybersecurityinfocus.com\/?p=1569","title":{"rendered":"How organizations can secure their AI code"},"content":{"rendered":"<div>\n<div class=\"grid grid--cols-10@md grid--cols-8@lg article-column\">\n<div class=\"col-12 col-10@md col-6@lg col-start-3@lg\">\n<div class=\"article-column__content\">\n<div class=\"container\"><\/div>\n<p>In 2023, the team at data extraction startup Reworkd was under tight deadlines. Investors pressured them to monetize the platform, and they needed to migrate everything from Next.js to Python\/FastAPI. To speed things up, the team decided to turn to ChatGPT to do some of the work. The AI-generated code appeared to function, so they implemented it directly into their production environment. And then called it a day.<\/p>\n<p>The next morning, they woke up \u201cwith over 40 Gmail notifications of user complaints,\u201d co-founder Ashim Shrestha wrote in a<a href=\"https:\/\/asim.bearblog.dev\/how-a-single-chatgpt-mistake-cost-us-10000\/\"> blog post<\/a>. \u201cEverything seemed to have set on fire overnight. None of these users could subscribe.\u201d<\/p>\n<p>A bug on line 56, which was AI-generated, caused a unique ID collision during the subscription process, and it took them five days to identify the issue and fix it. That bug, that \u201csingle ChatGPT mistake cost us $10,000+,\u201d Shrestha wrote.<\/p>\n<p>While Reworkd was open about their error, many similar incidents remain unknown. CISOs often learn about them behind closed doors. Financial institutions, healthcare systems, and e-commerce platforms have all encountered security challenges as code completion tools can introduce vulnerabilities, disrupt operations, or compromise data integrity. Many of the risks are associated with AI-generated code, library names that are the result of hallucinations, or the introduction of third-party dependencies that are untracked and unverified.<\/p>\n<p>\u201cWe\u2019re facing a perfect storm: increasing reliance on AI-generated code, rapid growth in open-source libraries, and the inherent complexity of these systems,\u201d says Jens Wessling, chief technology officer at Veracode. \u201cIt\u2019s only natural that security risks will escalate.\u201d<\/p>\n<p>Often, code completion tools like ChatGPT, GitHub Copilot, or Amazon CodeWhisperer are used covertly. A survey by<a href=\"https:\/\/go.snyk.io\/2023-ai-code-security-report.html\"> Snyk<\/a> showed that roughly 80% of developers ignore security policies to incorporate AI-generated code. This practice creates blind spots for organizations, who often struggle to mitigate security and legal issues that appear as a result.<\/p>\n<p>As automated coding tools see broader adoption, the discussion around the risks they pose has become a top priority for many CISOs and cybersecurity leaders. While these tools are revolutionary and can accelerate development, they also introduce a variety of security issues, some of which are hard to detect.<\/p>\n<h2 class=\"wp-block-heading\">Ensure software packages are identified<\/h2>\n<p>While the rise of AI-powered code completion tools has ushered in a new era of efficiency and innovation in software development, this progress comes with important security risks. \u201cAI-generated code often blends seamlessly with human-developed code, making it difficult to tell where security risks are coming from,\u201d Wessling says.<\/p>\n<p>Sometimes, the code that\u2019s automatically generated can include third-party libraries or phantom dependencies \u2014 dependencies that are not explicitly declared in a manifest file. These unreported software packages might not be identified during a scan and can potentially hide vulnerabilities.\u00a0<\/p>\n<p>One way to address this is to use software composition analysis (SCA) and software supply chain security tools which help identify the libraries that are in use, the vulnerabilities and the potential legal and compliance issues that might bring.\u00a0<\/p>\n<p>\u201cProperly tuned SCA that looks deeper than the surface might be the answer,\u201d says Grant Ongers, CSO and co-founder of Secure Delivery. This solution is not perfect, though. \u201cThe bigger issue with SCA tends to be including vulnerabilities in functions in libraries that are never called,\u201d he adds.<\/p>\n<p>Endor Labs\u2019<a href=\"https:\/\/www.endorlabs.com\/lp\/2024-dependency-management-report\"> 2024 Dependency Management Report<\/a> found that 56% of reported library vulnerabilities are in phantom dependencies for organizations with significant phantom dependency footprints. \u201cWe expect this to be an increasing challenge in organizations, and tools need to be able to give security teams visibility into all software components in use for both compliance and risk-management purposes,\u201d says Darren Meyer, staff research engineer at Endor Labs.<\/p>\n<p>That is why it is important organizations have an accurate inventory of their software components. \u201cWithout it, you can\u2019t identify, much less manage, risk coming from AI libraries, or indeed from any third-party library,\u201d Meyer adds. \u201cIf you don\u2019t have a way to identify the AI libraries \u2014 which are part of software being written, published, and\/or consumed by your organization \u2014 then you may have a compliance risk.\u201d<\/p>\n<h2 class=\"wp-block-heading\">Watch for ML models from community hubs<\/h2>\n<p>Organizations also expose themselves to risks when developers download machine learning (ML) models or datasets from platforms like Hugging Face.\u00a0<\/p>\n<p>\u201cIn spite of security checks on both ends, it may still happen that the model contains a backdoor that becomes active once the model is integrated,\u201d says Alex \u0218tef\u0103nescu, open-source developer at the Organized Crime and Corruption Reporting Project (OCCRP). \u201cThis could ultimately lead to data being leaked from the company that used the malicious models.\u201d<\/p>\n<p>At the start of 2024, the Hugging Face platform hosted at least 100 malicious ML models, some of which were capable of executing code on victims\u2019 machines, according to a<a href=\"https:\/\/jfrog.com\/blog\/data-scientists-targeted-by-malicious-hugging-face-ml-models-with-silent-backdoor\/\"> JFrog report<\/a>.<\/p>\n<p>When it comes to code completion tools like GitHub Copilot, \u0218tef\u0103nescu worries about hallucinations. \u201cAn LLM will always generate the most statistically probable continuation of a given prompt, so there are no real guarantees in place that it will generate a real package from PIPy, for example, after the word \u2018import\u2019,\u201d they say. \u201cSome attackers are aware of this and register package names on platforms like npm and PIPy, filling in some functionality that code completion tools suggest in order to make the packages seem legitimate.\u201d<\/p>\n<p>If these packages are imported into real applications, they can do real damage.<\/p>\n<p>To address these risks, CISOs can establish protocols for downloading and integrating ML models or datasets from external platforms such as Hugging Face. This includes implementing automated scanning tools to detect malicious code or backdoors, having a policy that only allows the use of models from verified publishers, or conducting internal testing in isolated environments.<\/p>\n<h2 class=\"wp-block-heading\">Ensure no sensitive information is leaking through AI coding assistants<\/h2>\n<p>Nearly half of organizations are concerned about AI systems learning and reproducing patterns that include sensitive information, according to GitGuardian\u2019s<a href=\"https:\/\/www.gitguardian.com\/files\/voice-of-practitioners-2024\"> Voice of Practitioners 2024 survey<\/a>. \u201cThis is particularly worrying because these tools suggest code based on patterns learned from training data, which could inadvertently include hard-coded credentials, for instance,\u201d Thomas Segura, reports author at GitGuardian, says.<\/p>\n<p>Companies based in the US were particularly worried about the possibility of sensitive information inadvertently leaking into codebases because of developers using AI-powered code completion tools.<\/p>\n<p>While there\u2019s no silver bullet, organizations can do a couple of things to decrease this risk. \u201cUsing self-hosted AI systems that don\u2019t report data back is an answer that works,\u201d Ongers says. \u201cAnother is to ensure data cannot enter.\u201d\u00a0<\/p>\n<h2 class=\"wp-block-heading\">Look outside traditional development teams<\/h2>\n<p>Not all AI-based tools are coming from teams full of software engineers. \u201cWe see a lot of adoption being driven by data analysts, marketing teams, researchers, etc. within organizations,\u201d Meyer says.<\/p>\n<p>These teams aren\u2019t traditionally developing their own software but are increasingly writing simple tools that adopt AI libraries and models, so they\u2019re often not aware of the risks involved. \u201cThis combination of shadow engineering with lower-than-average application security awareness can be a breeding ground for risk,\u201d he adds.<\/p>\n<p>To make sure these teams are working safely, CISOs must consider forming relationships with these teams early in the process. Cybersecurity leaders might also want to set up training programs tailored to non-traditional development teams to educate data analysts, marketing professionals, and researchers on the potential risks associated with AI-based tools and libraries.<\/p>\n<h2 class=\"wp-block-heading\">Safe resources for application security<\/h2>\n<p>\u201cSecurity budgets don\u2019t generally grow at the same pace that software development accelerates, and AI adoption is only widening that gap,\u201d Meyer says.\u00a0Application security is often underfunded in most organizations, yet allocating sufficient time and resources to it is essential, as AI adoption and AI-assisted coding accelerate the pace of software development.<\/p>\n<p>\u201cA portfolio of high-quality security tools that can help address this gap is no longer optional,\u201d Meyer says. \u201cAnd while tools are critical to closing the gap, so are AppSec and ProdSec staff that can effectively partner with developers \u2014 even non-traditional developers \u2014 and understand the technical, compliance, and security implications of AI.\u201d<\/p>\n<p>When it comes to securing enough resources to protect AI systems, some stakeholders might hesitate, viewing it as an optional expense rather than a critical investment. \u201cAI adoption is a divisive topic in many organizations, with some leaders and teams being \u2018all-in\u2019 on adoption and some being strongly resistant,\u201d Meyer says. \u201cThis tension can present challenges for insightful CISOs and business information security officers (BISOs).\u201d<\/p>\n<p>CISOs that are aware of both the advantages and disadvantages of this might try to set controls to manage risks effectively, but this might create the perception of holding the organization back from innovating if they don\u2019t explain what they do properly.\u00a0\u201cOrganizations need to develop comprehensive strategies that balance the productivity benefits of AI tools with robust security practices,\u201d Segura says.<\/p>\n<h2 class=\"wp-block-heading\">The risk of unsafe AI-powered open-source libraries<\/h2>\n<p>With AI changing the practices of writing code, the industry is navigating a fine line between embracing the opportunities AI can offer and mitigating the risks it can pose. Ongers says that this change of paradigm brings about several concerns. \u201cThe biggest, I think, is one of two extremes: either over reliance on AI that\u2019s flawed, or ignoring AI altogether,\u201d he says.<\/p>\n<p>With more than five million open-source libraries available today and an estimated half billion more to be released in the next decade, many of which will be powered by AI, organizations face an unprecedented challenge in managing the security risks associated with their software ecosystems.\u00a0<\/p>\n<p>\u201cThis is unfamiliar territory for the industry, and I do believe risk needs to be addressed at an industry level to ensure the safety, security, and quality of the software that powers our world,\u201d Wessling says.<\/p>\n<p>It\u2019s also important how these issues will be addressed. Right now, there\u2019s an explosion of security vendors that claim to secure AI, but not all of them are doing a meticulous job. As a result, \u201corganizations may be left with neither the visibility they need to make intelligent risk decisions nor the capabilities they need to act on those decisions,\u201d Meyer says. \u201cCISOs don\u2019t want to find themselves in the situation of building new capabilities when there\u2019s been a breach in the news \u2014 or worse, when it\u2019s their organization that\u2019s been breached.\u201d<\/p>\n<p>To prevent such situations, CISOs must prioritize investing in their people as much as in AI technologies. \u201cThe software development industry needs to see the true priority of training and enhancing the knowledge of its workforce,\u201d \u0218tef\u0103nescu says. \u201cInstead of paying code completion tool subscriptions, it should invest in the knowledge development of its staff.\u201d<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>In 2023, the team at data extraction startup Reworkd was under tight deadlines. Investors pressured them to monetize the platform, and they needed to migrate everything from Next.js to Python\/FastAPI. To speed things up, the team decided to turn to ChatGPT to do some of the work. The AI-generated code appeared to function, so they [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":1570,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-1569","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-education"],"_links":{"self":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/1569"}],"collection":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1569"}],"version-history":[{"count":0,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/1569\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/media\/1570"}],"wp:attachment":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1569"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1569"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1569"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}