{"id":2560,"date":"2025-04-01T10:29:45","date_gmt":"2025-04-01T10:29:45","guid":{"rendered":"https:\/\/cybersecurityinfocus.com\/?p=2560"},"modified":"2025-04-01T10:29:45","modified_gmt":"2025-04-01T10:29:45","slug":"llms-are-now-available-in-snack-size-but-digest-with-care","status":"publish","type":"post","link":"https:\/\/cybersecurityinfocus.com\/?p=2560","title":{"rendered":"LLMs are now available in snack size but digest with care"},"content":{"rendered":"<div>\n<div class=\"grid grid--cols-10@md grid--cols-8@lg article-column\">\n<div class=\"col-12 col-10@md col-6@lg col-start-3@lg\">\n<div class=\"article-column__content\">\n<div class=\"container\"><\/div>\n<p>As large language models (LLMs) gain mainstream, they are pushing the edges on AI-driven applications, adding more power and complexity. Running these massive models, however, comes at a price. The high costs and latency associated with them make them impractical for many real-world scenarios.<\/p>\n<p>Enter model distillation. A technique AI engineers are using to pack the most useful aspects of these \u201chigh-parameter\u201d models into much smaller rip-offs. They do this by training a smaller \u201cstudent\u201d model from the ground up to replicate the behavior of a larger \u201cteacher\u201d model.<\/p>\n<p>\u201cModel distillation enables engineers to capture much of the operational capacity of a high-parameter model within the reduced computational footprint of a lower-parameter model,\u201d said David Brauchler, technical director &amp; head of AI and ML security at NCC Group. \u201cModel distillation is most effective when the student model has a more limited purpose or knowledge domain than the generalized teacher model.\u201d<\/p>\n<p>While distillation enables cost savings, faster inference, and better operational efficiency, distilled models inherit many security risks from their teacher models, along with a few others of their own.<\/p>\n<h2 class=\"wp-block-heading\">Students take on the teacher\u2019s burden<\/h2>\n<p>Distilled models inherit a huge part of their teacher model\u2019s behavior, including any security risks embedded in their training data. These risks include intellectual property theft, privacy leaks, and model inversion attacks.<\/p>\n<p>\u201cTypical model distillation uses the training data originally consumed by the larger teacher model alongside the teacher model\u2019s predictions of valid possible outputs (i.e. the probability distribution of outputs),\u201d Brauchler said. \u201cConsequently, the student model has the opportunity to memorize many of the same behaviors as the teacher model, including sensitive data in the training sets.\u201d<\/p>\n<p>Security vulnerabilities of teacher models carry into the student through the transfer of latent knowledge, biases, and flaws. What this means is that DistilGPT-2, a student model to GPT-2, is equally capable of leaking personally identifiable information (PII) from its training data that GPT-2 was found <a href=\"https:\/\/arxiv.org\/abs\/2012.07805?utm_source=chatgpt.com\">guilty of in 2020<\/a> when prompted in specific ways.<\/p>\n<p>The same distillation is potentially prone to a model inversion attack through the black-box extraction techniques that GPT-3.5 <a href=\"https:\/\/arxiv.org\/abs\/2403.06634?utm_source=chatgpt.com\">was demonstrated vulnerable<\/a> to in a study in 2020. Smaller models represent less complex functions and are often more vulnerable to security attacks such as model inversion, Brauchler added.<\/p>\n<h2 class=\"wp-block-heading\"><a><\/a>Passed down wisdom can distort reality<\/h2>\n<p>Rather than developing their own contextual understanding, student models rely heavily on their teacher models\u2019 pre-learned conclusions. Whether this limitation can lead to model hallucination is highly debated by experts.<\/p>\n<p>Brauchler is of the opinion that the efficiency of the student models is tied to that of their teachers, irrespective of the way they were trained. What this means is that if a teacher model isn\u2019t hallucinated, chances are students won\u2019t be either.<\/p>\n<p>Agreeing with most of that argument, Arun Chandrasekaran, VP Analyst at Gartner, clarifies student models may indeed suffer from newly introduced hallucinations with respect to their size and purpose.<\/p>\n<p>\u201cDistillation itself does not necessarily increase the rate of hallucinations, but if the student model is significantly smaller, it might lack the capacity to capture all the nuances of the teacher model, potentially leading to more errors or oversimplifications,\u201d Chandrasekaran said.<\/p>\n<p>When a model hallucinates, it can be exploited by threat actors to craft adversarial prompts that manipulate outputs, leading to misinformation campaigns or AI-driven exploits.<\/p>\n<p>An instance of model hallucination used by miscreants is the <a href=\"https:\/\/www.csoonline.com\/article\/646441\/wormgpt-a-generative-ai-tool-to-compromise-business-emails.html\">discovery of WormGPT in 2023<\/a>, an AI system deliberately trained on unverified, potentially biased, and adversarial data to hallucinate legal terminologies, business processes, and financial policies to create convincing but completely fabricated phishing emails and scam content.<\/p>\n<h2 class=\"wp-block-heading\"><a><\/a>Snatch AI made easy<\/h2>\n<p>Distilled models also lower the barriers for adversaries attempting <a href=\"https:\/\/www.csoonline.com\/article\/573031\/adversarial-machine-learning-explained-how-attackers-disrupt-ai-and-ml-systems.html\">model extraction attacks<\/a>. By extensively querying these models, attackers can approximate their decision boundaries and recreate functionally similar models\u2014often with reduced security constraints.<\/p>\n<p>\u201cOnce an adversary has extracted a model, they can potentially modify it to bypass security measures or proprietary guidelines embedded in the original model,\u201d Chandrasekaran said. \u201cThis could include altering the model\u2019s behavior to ignore certain inputs or to produce outputs that align with the adversary\u2019s goals.\u201d<\/p>\n<p>Brauchler, however, argues that bypassing an AI model\u2019s proprietary security guardrails is not the primary driver behind model extraction attacks using distilled models. \u201cModel extraction is usually exploited with the intent of capturing a proprietary model\u2019s performance, not with the express purpose of bypassing guardrails,\u201d he said. \u201cThere are much less strenuous techniques to avoid AI guardrails.\u201d<\/p>\n<p>Instead of using a distilled model for extraction, he explained, threat actors may disguise a malicious model as a crispier version, given that model extraction attacks closely resemble model distillation.<\/p>\n<p>One particular risk arises when proprietary models provide probability distributions (soft labels), as threat actors can leverage distillation methodologies to replicate the target model\u2019s functional behavior. While similar attacks can be executed using only output labels, the absence of probability distributions significantly reduces their effectiveness, added Brauchler.<\/p>\n<p>To sum up, distillation can potentially expose models to extraction, either by serving as a cover for replicating a source model\u2019s behavior in an extraction attack or by enabling post-distillation extraction attempts with security bypasses.<\/p>\n<h2 class=\"wp-block-heading\"><a><\/a>They may not always have your back<\/h2>\n<p>Another downside to distillation is its interpretability. Large LLMs benefit from extensive logs and complex decision-making pathways that security teams can analyze for root cause investigation. Distilled models, however, often lack this granularity making it harder to diagnose vulnerabilities or trace security incidents.<\/p>\n<p>\u201cIn the context of incident response, the lack of detailed logs and parameters in student models can make it harder to perform root cause analysis,\u201d Chandrasekaran said. \u201cSecurity researchers might find it more difficult to pinpoint the exact conditions or inputs that led to a security incident or to understand how an adversary exploited a vulnerability.\u201d<\/p>\n<p>This opacity complicates defensive strategies and forces security teams to rely on external monitoring techniques rather than internal AI audit trails.<\/p>\n<h2 class=\"wp-block-heading\"><a><\/a>Fighting the AI curse<\/h2>\n<p>While security risks from distilled models are quite pressing, the broader risk remains the nascent state of AI security itself, which is a key driver of all these vulnerabilities.<\/p>\n<p>\u201cAI guardrails remain soft defense-in-depth controls, not security boundaries,\u201d Brauchler noted. \u201cAnd as systems move toward agentic contexts, the AI engineering industry will quickly discover that relying on guardrails will result in deep, impactful security vulnerabilities in critical systems, as NCC Group has already observed across multiple application environments.\u201d<\/p>\n<p>Only when developers change the way they think about AI application architectures, will we be able to move toward designing systems with trust-based access controls in mind, he added.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>As large language models (LLMs) gain mainstream, they are pushing the edges on AI-driven applications, adding more power and complexity. Running these massive models, however, comes at a price. The high costs and latency associated with them make them impractical for many real-world scenarios. Enter model distillation. A technique AI engineers are using to pack [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":2561,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-2560","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-education"],"_links":{"self":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/2560"}],"collection":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2560"}],"version-history":[{"count":0,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/2560\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/media\/2561"}],"wp:attachment":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2560"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2560"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2560"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}