{"id":4986,"date":"2025-09-18T20:16:02","date_gmt":"2025-09-18T20:16:02","guid":{"rendered":"https:\/\/cybersecurityinfocus.com\/?p=4986"},"modified":"2025-09-18T20:16:02","modified_gmt":"2025-09-18T20:16:02","slug":"ai-might-scheme-less-thanks-to-this-openai-research","status":"publish","type":"post","link":"https:\/\/cybersecurityinfocus.com\/?p=4986","title":{"rendered":"AI Might \u2018Scheme\u2019 Less Thanks to This OpenAI Research"},"content":{"rendered":"<p>AI \u201cscheming\u201d can prove to be a challenge when working with generative artificial intelligence. AI might follow the letter of a prompt but cut corners under the hood.\u00a0<\/p>\n<p>On Sep. 17, OpenAI published a blog post saying it had developed, with Apollo Research, a method called deliberative alignment to reduce scheming in two of its AI models.\u00a0\u00a0<\/p>\n<p>\u201cWe are seeing signs that this issue is beginning to emerge across all frontier models today,\u201d the researchers wrote.<\/p>\n<h2 class=\"wp-block-heading\">What is AI scheming?\u00a0<\/h2>\n<p>Generative AI can deliberately withhold or distort task-relevant information, or arrive at a conclusion that aligns with the literal meaning of a prompt but not the prompt\u2019s intended purpose. <a href=\"https:\/\/www.eweek.com\/news\/news-anthropic-claude-sonnet-ethics-test\/\">AI scheming<\/a> usually causes small-scale problems, such as ChatGPT pretending to complete a task that is actually outside its capabilities.\u00a0<\/p>\n<p>\u201cHowever, as AIs are assigned more complex tasks with real-world consequences and begin pursuing more ambiguous, long-term goals, we expect that the potential for harmful scheming will grow \u2014 so our safeguards and our ability to rigorously test must grow correspondingly,\u201d <a href=\"https:\/\/openai.com\/index\/detecting-and-reducing-scheming-in-ai-models\/\" target=\"_blank\" rel=\"noopener\">the OpenAI researchers wrote<\/a>.\u00a0<\/p>\n<p>Ironically, tests attempting to stop scheming may, in fact, reinforce it, the researchers noted.\u00a0<\/p>\n<p>\u201cMitigations that reduce scheming may either (a) truly eliminate it, or (b) just teach the model to conceal it better,\u201d they noted. \u201cBoth outcomes look the same from the outside, so the reason for behavioral change is critical.\u201c<\/p>\n<p>In an evaluation conducted with <a href=\"https:\/\/www.antischeming.ai\/\" target=\"_blank\" rel=\"noopener\">Apollo Research<\/a>, OpenAI created:\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/p>\n<p>A teaching method to reduce scheming.\u00a0<\/p>\n<p>A protocol for stress-testing future iterations of such methods.\u00a0<\/p>\n<p>Plans to conduct further research with Apollo, raise awareness of the issue, and dedicate more team resources to scheming-related research.<\/p>\n<p>OpenAI found cases of the AI engaging in covert actions, such as withholding or distorting information, in OpenAI o3 and o4-mini, Gemini-2.5-pro, and Claude Opus-4. (The investigation was conducted before GPT-5 was available.)\u00a0<\/p>\n<p>The training performed based on the evaluation\u2019s findings reduced occurrences of scheming in OpenAI o4-mini from 8.7% to 0.3%. Scheming in OpenAI o3 was reduced from 13% to 0.4%.\u00a0<\/p>\n<h2 class=\"wp-block-heading\">Ongoing challenges<\/h2>\n<p>Another factor in reducing hidden processes in AI is reasoning transparency, or the difficulty in teaching generative AI to accurately explain its <a href=\"https:\/\/www.techrepublic.com\/article\/apple-ai-reasoning-research\/\" target=\"_blank\" rel=\"noopener\">own thought processes<\/a>. As OpenAI pointed out, reasoning transparency is an ongoing field of study among AI giants such as Anthropic.\u00a0<\/p>\n<p>In June, Google removed a view that had allowed developers to trace reasoning in Google AI Studio. OpenAI o1, the company\u2019s first reasoning model, also <a href=\"https:\/\/venturebeat.com\/ai\/heres-how-openai-o1-might-lose-ground-to-open-source-models\" target=\"_blank\" rel=\"noopener\">failed to show<\/a> its chain of thought.\u00a0<\/p>\n<p>Vulnerabilities related to reasoning transparency are eligible for submission to OpenAI\u2019s Red-Teaming Challenge for gpt-oss-20b.\u00a0<\/p>\n<p><strong>OpenAI updated ChatGPT in response to address <\/strong><a href=\"https:\/\/www.eweek.com\/news\/openai-chatgpt-teen-safety-update-2025\/\"><strong>dangers to teens<\/strong><\/a><strong>.\u00a0<\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.eweek.com\/artificial-intelligence\/openai-scheming-research\/\">AI Might \u2018Scheme\u2019 Less Thanks to This OpenAI Research<\/a> appeared first on <a href=\"https:\/\/www.eweek.com\/\">eWEEK<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>AI \u201cscheming\u201d can prove to be a challenge when working with generative artificial intelligence. AI might follow the letter of a prompt but cut corners under the hood.\u00a0 On Sep. 17, OpenAI published a blog post saying it had developed, with Apollo Research, a method called deliberative alignment to reduce scheming in two of its [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-4986","post","type-post","status-publish","format-standard","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/4986"}],"collection":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4986"}],"version-history":[{"count":0,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/4986\/revisions"}],"wp:attachment":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4986"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4986"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4986"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}