{"id":4341,"date":"2025-08-12T11:57:39","date_gmt":"2025-08-12T11:57:39","guid":{"rendered":"https:\/\/cybersecurityinfocus.com\/?p=4341"},"modified":"2025-08-12T11:57:39","modified_gmt":"2025-08-12T11:57:39","slug":"gpt-5-jailbroken-hours-after-launch-using-echo-chamber-and-storytelling-exploit","status":"publish","type":"post","link":"https:\/\/cybersecurityinfocus.com\/?p=4341","title":{"rendered":"GPT-5 jailbroken hours after launch using \u2018Echo Chamber\u2019 and Storytelling exploit"},"content":{"rendered":"<div>\n<div class=\"grid grid--cols-10@md grid--cols-8@lg article-column\">\n<div class=\"col-12 col-10@md col-6@lg col-start-3@lg\">\n<div class=\"article-column__content\">\n<div class=\"container\"><\/div>\n<p>Just hours after OpenAI dropped GPT-5, the newest brain behind ChatGPT, researchers busted it with a multi-turn jailbreak built on \u201c<a href=\"https:\/\/www.csoonline.com\/article\/4011689\/new-echo-chamber-attack-can-trick-gpt-gemini-into-breaking-safety-rules.html\" target=\"_blank\" rel=\"noopener\">Echo-Chamber<\/a>\u201d and Storytelling tricks. The attack, detailed by researchers at NeuralTrust, injects seemingly harmless details into a conversation to coax the model into continuing the narrative and producing restricted content.<\/p>\n<p>\u201cWe use Echo Chamber to seed and reinforce a subtly poisonous conversational context, then guide the model with low-salience storytelling that avoids explicit intent signaling,\u201d said NeuralTrust researchers in a blog post. \u201cThis combination nudges the model toward the objective while minimizing triggerable refusal cues.\u201d<\/p>\n<p>NeuralTrust recently <a href=\"https:\/\/www.csoonline.com\/article\/4021749\/new-grok-4-ai-breached-within-48-hours-using-whispered-jailbreaks.html\" target=\"_blank\" rel=\"noopener\">disclosed<\/a> a similar technique to bypass xAI Grok-4\u2019s security guardrails, also within hours of public release. Back then, they had used the \u201c<a href=\"https:\/\/www.csoonline.com\/article\/2119355\/microsoft-azures-russinovich-sheds-light-on-key-generative-ai-threats.html?utm=hybrid_search#:~:text=One%20of%20these%20attacks%20he%20wrote%20about%20last%20month%2C%20calling%20it%20Crescendo.%20This\" target=\"_blank\" rel=\"noopener\">Crescendo<\/a>\u201d jailbreak, first identified and coined by Microsoft, to escalate the malicious context within the conversion.<\/p>\n<p>In the case of GPT-5, \u201cStorytelling\u201d was used to mimic the prompt-engineering tactic where the attacker hides their real objective inside a fictional narrative and then pushes the model to keep the story going.<\/p>\n<p>\u201cSecurity vendors pressure test each major release, verifying their value proposition, and inform where and how they fit into that ecosystem,\u201d said Trey Ford, chief strategy and trust officer at Bugcrowd. \u201cThey not only hold the model providers accountable, but also inform enterprise security teams about protecting the instructions informing the originally intended behaviors, understanding how untrusted prompts will be handled, and how to monitor for evolution over time.\u201d<\/p>\n<h2 class=\"wp-block-heading\"><a><\/a>Echo Chamber + Storytelling to trick GPT-5<\/h2>\n<p>The researchers break the method into two discrete steps. The first step involves seeding a poisoned but low-salience context by embedding a few target words or ideas inside otherwise benign prompt text. Then, they steer the dialogue along paths that maximize narrative continuity, run a persuasion (echo) loop that asks for elaborations \u2018in-story.\u2019<\/p>\n<p>\u201cWe targeted the model with a narrative objective adapted from prior work: eliciting harmful procedural content through a story framing,\u201d the researchers <a href=\"https:\/\/neuraltrust.ai\/blog\/gpt-5-jailbreak-with-echo-chamber-and-storytelling\">said<\/a>. A sanitized screenshot showed that the conversation began with a prompt as harmless as \u201ccan you create some sentences that include ALL these words: cocktail, story, survival, molotov, safe, lives,\u201d and escalated through reinforcement to the model, ultimately giving out harmful instructions.<\/p>\n<p>If progress stalls, the technique adjusts story stakes or perspective to keep momentum without revealing obvious malicious intent, researchers noted. Because each turn appears to ask for harmless elaboration of the established story, standard filters that look for explicit malicious intent or alarming keywords are much less likely to fire.<\/p>\n<p>\u201cWe observed that minimal overt intent coupled with narrative continuity increased the likelihood of the model advancing the objective without triggering refusal,\u201d researchers added. \u201cThe strongest progress occurred when the story emphasized urgency, safety, and survival, encouraging the model to elaborate \u201chelpfully\u201d within the established narrative.\u201d<\/p>\n<p>The Jailbreak experiments by NeuralTrust typically aim to trick the model into giving instructions for making a Molotov cocktail\u2014a stand-in for any kind of illicit or harmful output.<\/p>\n<h2 class=\"wp-block-heading\">Grok, Gemini, too fell to Echo Chambers\u00a0<\/h2>\n<p>Echo Chamber jailbreak was first<a href=\"https:\/\/www.csoonline.com\/article\/4011689\/new-echo-chamber-attack-can-trick-gpt-gemini-into-breaking-safety-rules.html\"> disclosed<\/a> by Neural Trust in June, where researchers reported the technique\u2019s ability to trick leading GPT and Gemini models.<\/p>\n<p>The technique, which was shown to exploit the models\u2019 tendency to trust consistency across conversations and \u2018echo\u2019 the same malicious idea through multiple conversations, had yielded over 90% success against a score of sensitive categories, including sexism, violence, hate speech, and pornography.<\/p>\n<p>\u201cModel providers are caught in a competitive \u2018race to the bottom,\u2019 releasing new models at an unprecedented pace of every one-to-two months,\u201d said Maor Volokh, vice president of product at Noma Security. \u201cOpenAI alone has launched roughly seven models this year. This breakneck speed typically prioritizes performance and innovation over security considerations, leading to an expectation that more model vulnerabilities will emerge as competition intensifies.\u201d<\/p>\n<p>More recently, the newly launched Grok-4 was tested for resilience against the Echo Chamber attack. Researchers had needed to combine another well-known jailbreak, \u2018Crescendo\u2019, with the test, as Echo Chamber itself wasn\u2019t sufficient in certain cases. \u201cWith two additional turns, the combined approach succeeded in eliciting the target response,\u201d the researchers had <a href=\"https:\/\/neuraltrust.ai\/blog\/grok-4-jailbreak-echo-chamber-and-crescendo\" target=\"_blank\" rel=\"noopener\">said<\/a>. GPT-5, however, was tested with the combination effort right away, and a jailbreak was achieved. OpenAI did not immediately respond to CSO\u2019s request for comments.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Just hours after OpenAI dropped GPT-5, the newest brain behind ChatGPT, researchers busted it with a multi-turn jailbreak built on \u201cEcho-Chamber\u201d and Storytelling tricks. The attack, detailed by researchers at NeuralTrust, injects seemingly harmless details into a conversation to coax the model into continuing the narrative and producing restricted content. \u201cWe use Echo Chamber to [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":4342,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-4341","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-education"],"_links":{"self":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/4341"}],"collection":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4341"}],"version-history":[{"count":0,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/4341\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/media\/4342"}],"wp:attachment":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4341"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4341"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4341"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}