{"id":6198,"date":"2025-12-12T01:01:15","date_gmt":"2025-12-12T01:01:15","guid":{"rendered":"https:\/\/cybersecurityinfocus.com\/?p=6198"},"modified":"2025-12-12T01:01:15","modified_gmt":"2025-12-12T01:01:15","slug":"openai-expands-defense-in-depth-security-to-stop-hackers-using-its-ai-models-to-launch-cyberattacks","status":"publish","type":"post","link":"https:\/\/cybersecurityinfocus.com\/?p=6198","title":{"rendered":"OpenAI expands \u2018defense in depth\u2019 security to stop hackers using its AI models to launch cyberattacks"},"content":{"rendered":"<div>\n<div class=\"grid grid--cols-10@md grid--cols-8@lg article-column\">\n<div class=\"col-12 col-10@md col-6@lg col-start-3@lg\">\n<div class=\"article-column__content\">\n<div class=\"container\"><\/div>\n<p>OpenAI is preparing for the possibility that threat groups will try to abuse its increasingly powerful AI frontier models to carry out sophisticated cyberattacks.<\/p>\n<p>In a blog, the <a href=\"https:\/\/openai.com\/index\/strengthening-cyber-resilience\/\" target=\"_blank\" rel=\"noopener\">company describes<\/a> how the evolving capabilities of its models could be used to \u201cdevelop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy enterprise or industrial intrusion operations aimed at real-world effects.\u201d<\/p>\n<p>According to OpenAI, the underlying problem is that offensive and defensive uses of AI rely on the same knowledge and techniques. This makes it challenging to enable one without making possible the other. <\/p>\n<p>\u201cWe are investing in safeguards to help ensure these powerful capabilities primarily benefit defensive uses and limit uplift for malicious purposes,\u201d the company said, adding, \u201cwe see this work not as a one-time effort, but as a sustained, long-term investment in giving defenders an advantage and continually strengthening the security posture of the critical infrastructure across the broader ecosystem.\u201d<\/p>\n<p>One new initiative is the Frontier Risk Council. The company offered few details of how this will operate, but said it was part of an expanding \u201cdefense in depth\u201d strategy designed to contain the widely-speculated potential of AI as an adversarial tool.<\/p>\n<p>\u201cMembers will advise on the boundary between useful, responsible capability and potential misuse, and these learnings will directly inform our evaluations and safeguards. We will share more on the council soon\u201d OpenAI said. \u00a0<\/p>\n<p>Other initiatives mentioned in the blog include expanding guardrails against misuse, external Red Team testing to assess model security, and a trusted access program designed to give qualifying customers access to enhanced models to explore defensive use cases.<\/p>\n<p>OpenAI also plans to expand its use of its recently announced <a href=\"https:\/\/openai.com\/index\/introducing-aardvark\/\" target=\"_blank\" rel=\"noopener\">Aardvark Agentic Security Researcher scanning tool<\/a> beta to identify vulnerabilities in its codebase and suggest patches or mitigations.<\/p>\n<h2 class=\"wp-block-heading\">Red Teaming AI<\/h2>\n<p>AI companies find themselves under increasing pressure to explain how they will block model misuse. The anxiety is not hypothetical; last month, OpenAI rival <a href=\"https:\/\/www.csoonline.com\/article\/4092571\/ai-controlled-cyber-attack-causes-a-stir.html\" target=\"_blank\" rel=\"noopener\">Anthropic admitted<\/a> that its AI programming tool, Claude Code, had been used as part of a cyberattack targeting 30 organizations, the first time malicious AI exploitation has been discovered on this scale.<\/p>\n<p>Meanwhile, university researchers in the US reported this week that the <a href=\"https:\/\/arxiv.org\/pdf\/2512.09882\" target=\"_blank\" rel=\"noopener\">Artemis AI research platform outperformed nine out of ten penetration testers<\/a> at finding security vulnerabilities. As the team pointed out, it did this at a fraction of the cost of a human researcher, potentially expanding access to such capabilities beyond well-resourced criminals.<\/p>\n<p>Balancing this is the possibility that defenders could use AI to find the same vulnerabilities. OpenAI\u2019s blog alludes to this capability when it mentions testing its models against the <a href=\"https:\/\/openai.com\/index\/red-teaming-network\/\" target=\"_blank\" rel=\"noopener\">Red Teaming Network<\/a> it announced two years ago.<\/p>\n<p>The reaction of industry experts to OpenAI\u2019s latest announcement has been mixed. A recurring worry is the inherent difficulty of stopping malicious use of leading models.<\/p>\n<p>\u201cOpenAI is asking models to constrain their own capabilities through refusal training, which can be compared to asking a lock to decide when it should open,\u201d commented <a href=\"https:\/\/www.linkedin.com\/in\/jessewilliams1\/\" target=\"_blank\" rel=\"noopener\">Jesse Williams<\/a>, co-founder and COO of AI agent DevOps company, Jozu. In effect, the model, not its human authors, defines what is harmful.<\/p>\n<p>\u201cThe distinction is intent and authorization, which models cannot infer from prompts. Jailbreaks consistently defeat refusal training, and sophisticated adversaries will probe detection boundaries and route around them. Safeguards reduce casual misuse, but won\u2019t stop determined threats,\u201d said Williams.<\/p>\n<p>\u201cOpenAI\u2019s \u2018trusted access program\u2019 sounds reasonable until you examine implementation. Who qualifies as trusted? University researchers? Defense contractors? Foreign SOC analysts?\u201d<\/p>\n<p>Even with guardrails, AI safety can\u2019t be guaranteed, <a href=\"https:\/\/www.sans.org\/profiles\/rob-lee\" target=\"_blank\" rel=\"noopener\">Rob Lee<\/a>, chief AI officer at the SANS Institute, observed. <\/p>\n<p>\u201cLast month, Anthropic disclosed that attackers used Claude Code, a public model with guardrails, to execute 80-90% of a state-sponsored cyberattack autonomously. They bypassed the safety controls by breaking tasks into innocent-looking requests and claiming to be a legitimate security firm. The AI wrote exploit code, harvested credentials, and exfiltrated data while humans basically supervised from the couch,\u201d he pointed out. <\/p>\n<p>\u201cThat\u2019s the model with guardrails. But if you\u2019re [a villain] and you want your AI Minions to be as evil as possible, you just spin up your own unguardrailed model,\u201d he said. \u201c[There are] plenty of open-weight options out there with no ethics training, no safety controls, and nobody watching. Evil will use evil. \u2026 OpenAI\u2019s safety frameworks only constrain the people who weren\u2019t going to attack you anyway.\u201d<\/p>\n<p>Not all experts are this pessimistic. According to <a href=\"https:\/\/www.sans.org\/profiles\/allan-liska\" target=\"_blank\" rel=\"noopener\">Allan Liska<\/a>, threat intelligence analyst at Recorded Future, it is important not to exaggerate the threat posed by AI. \u201cWhile we have reported an uptick in interest and capabilities of both nation-state and cybercriminal threat actors when it comes to AI usage, these threats do not exceed the ability of organizations following best security practices,\u201d said Liska.<\/p>\n<p>\u201cThat may change in the future, however, at this moment it is more important than ever to understand the difference between hype and reality when it comes to AI and other threats.\u201d<\/p>\n<p><em>A previous version of this story contained comments incorrectly attributed to Rob Lee, which have been replaced with the correct remarks.<\/em><\/p>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>OpenAI is preparing for the possibility that threat groups will try to abuse its increasingly powerful AI frontier models to carry out sophisticated cyberattacks. In a blog, the company describes how the evolving capabilities of its models could be used to \u201cdevelop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":6187,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-6198","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-education"],"_links":{"self":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/6198"}],"collection":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6198"}],"version-history":[{"count":0,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/6198\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/media\/6187"}],"wp:attachment":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6198"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6198"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6198"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}