OpenAI expands ‘defense in depth’ security to stop hackers using its AI models to launch cyberattacks

December 12 • 1:01 am

Tags:

No tags

OpenAI is preparing for the possibility that threat groups will try to abuse its increasingly powerful AI frontier models to carry out sophisticated cyberattacks.

In a blog, the company describes how the evolving capabilities of its models could be used to “develop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy enterprise or industrial intrusion operations aimed at real-world effects.”

According to OpenAI, the underlying problem is that offensive and defensive uses of AI rely on the same knowledge and techniques. This makes it challenging to enable one without making possible the other.

“We are investing in safeguards to help ensure these powerful capabilities primarily benefit defensive uses and limit uplift for malicious purposes,” the company said, adding, “we see this work not as a one-time effort, but as a sustained, long-term investment in giving defenders an advantage and continually strengthening the security posture of the critical infrastructure across the broader ecosystem.”

One new initiative is the Frontier Risk Council. The company offered few details of how this will operate, but said it was part of an expanding “defense in depth” strategy designed to contain the widely-speculated potential of AI as an adversarial tool.

“Members will advise on the boundary between useful, responsible capability and potential misuse, and these learnings will directly inform our evaluations and safeguards. We will share more on the council soon” OpenAI said.

Other initiatives mentioned in the blog include expanding guardrails against misuse, external Red Team testing to assess model security, and a trusted access program designed to give qualifying customers access to enhanced models to explore defensive use cases.

OpenAI also plans to expand its use of its recently announced Aardvark Agentic Security Researcher scanning tool beta to identify vulnerabilities in its codebase and suggest patches or mitigations.

Red Teaming AI

AI companies find themselves under increasing pressure to explain how they will block model misuse. The anxiety is not hypothetical; last month, OpenAI rival Anthropic admitted that its AI programming tool, Claude Code, had been used as part of a cyberattack targeting 30 organizations, the first time malicious AI exploitation has been discovered on this scale.

Meanwhile, university researchers in the US reported this week that the Artemis AI research platform outperformed nine out of ten penetration testers at finding security vulnerabilities. As the team pointed out, it did this at a fraction of the cost of a human researcher, potentially expanding access to such capabilities beyond well-resourced criminals.

Balancing this is the possibility that defenders could use AI to find the same vulnerabilities. OpenAI’s blog alludes to this capability when it mentions testing its models against the Red Teaming Network it announced two years ago.

The reaction of industry experts to OpenAI’s latest announcement has been mixed. A recurring worry is the inherent difficulty of stopping malicious use of leading models.

“OpenAI is asking models to constrain their own capabilities through refusal training, which can be compared to asking a lock to decide when it should open,” commented Jesse Williams, co-founder and COO of AI agent DevOps company, Jozu. In effect, the model, not its human authors, defines what is harmful.

“The distinction is intent and authorization, which models cannot infer from prompts. Jailbreaks consistently defeat refusal training, and sophisticated adversaries will probe detection boundaries and route around them. Safeguards reduce casual misuse, but won’t stop determined threats,” said Williams.

“OpenAI’s ‘trusted access program’ sounds reasonable until you examine implementation. Who qualifies as trusted? University researchers? Defense contractors? Foreign SOC analysts?”

According to Rob Lee, chief AI officer at the SANS Institute, the problem of AI misuse can’t be solved by one company on its own – not even the mighty OpenAI. “Companies are pushing models that can autonomously discover or weaponize vulnerabilities, but the global safety ecosystem — governments, frontier labs, researchers, and standards bodies — is fragmented and uncoordinated,” said Lee.

“The result is a widening gap where speed becomes its own vulnerability, creating conditions for cascading failures across infrastructure, finance, healthcare, and critical systems.”

Not all experts are this pessimistic. According to Allan Liska, threat intelligence analyst at Recorded Future, it is important not to exaggerate the threat posed by AI. “While we have reported an uptick in interest and capabilities of both nation-state and cybercriminal threat actors when it comes to AI usage, these threats do not exceed the ability of organizations following best security practices,” said Liska.

“That may change in the future, however, at this moment it is more important than ever to understand the difference between hype and reality when it comes to AI and other threats.”

OpenAI expands ‘defense in depth’ security to stop hackers using its AI models to launch cyberattacks

Red Teaming AI

No Responses

Leave a Reply Cancel reply