How to make LLMs a defensive advantage without creating a new attack surface

February 27 • 10:00 am

Tags:

No tags

Large language models (LLMs) have arrived in security in three different forms at once: as productivity tools that sit beside analysts, as components embedded inside products and workflows and as targets that attackers can probe, manipulate and steal.

That convergence is why the conversation feels messy. The same capability that can summarize an incident in seconds can also generate a believable pretext for a spear phish. The same assistant that can draft detection logic can also be induced to leak sensitive context if it is wired into internal knowledge bases without guardrails.

I treat LLMs as another high-impact system: define outcomes, model threats and build controls that assume the model will be wrong or manipulated.

How LLMs are changing the work of security teams

When people say “LLMs in the SOC,” they often picture a chat interface. The more meaningful shift is architectural: LLMs make it cheap to translate unstructured security data into structured hypotheses, narratives and next steps. That matters because a large share of security work is not technically difficult. It is context stitching.

If I were rolling out an LLM capability inside a security program, I would start with a narrow set of workflows where the output is advisory and easy to verify. Then I would expand only after the team can measure quality and manage failure modes.

Here are high-value use cases that are well-suited to early adoption:

Alert triage summaries that turn raw telemetry into a short “what happened, why it matters and what I should check next” narrative

Investigation copilots that generate a timeline from logs, tickets and chat transcripts, then highlight gaps and recommended pivots

Detection engineering assistance for drafting Sigma, YARA or query language snippets that an engineer can review and test

Vulnerability management copilots that cluster similar findings, explain exploitability in business terms and propose patch windows

Policy and standards Q&A, where the assistant answers questions by citing the exact internal control language it relied on

Even in these safe scenarios, the operating rule I use is simple: treat the LLM output as untrusted. If a model is allowed to write code, recommend a containment action or reference internal data, you should assume it can hallucinate, be socially engineered or be prompted into unsafe behavior.

The OWASP community has cataloged common failure modes for LLM-enabled applications, including prompt injection, insecure output handling, sensitive information disclosure, excessive agency and overreliance. Those are not academic concepts. They map directly to the ways LLMs fail in security workflows. See OWASP Top 10 for LLM applications.

Practically, I think of an LLM deployment in security as three layers: the model, the data it can see and the actions it can take. You can get significant value by widening the first layer (e.g., by using better models or prompts) while keeping the other two layers constrained.

Three design choices consistently reduce risk without killing value:

Make sources explicit: Use retrieval-augmented generation so the assistant answers from curated documents, tickets or playbooks and show the cited snippets to the analyst.

Keep the model out of the blast radius: The model should not hold secrets. Use short-lived credentials, scoped tokens and brokered access to tools.

Gate actions: Anything that changes a system state (blocking, quarantining, deleting, emailing) should require human approval or a separate policy engine.

Leadership still needs to be clear-eyed about measurement. If you cannot quantify whether the assistant reduces response time or improves signal quality, you are taking on a new class of operational risk for uncertain gain.

How attackers are using the same capability to scale and personalize

Defenders are not the only ones automating context stitching. Attackers can use LLMs to do reconnaissance, craft language and iterate quickly. The result is not new attacks so much as a change in the economics of attackers: cheaper personalization, faster iteration and fewer language barriers.

The most immediate impact is in social engineering. A good phishing email is not just about correct grammar. It is situational relevance: the right tone, the right internal jargon and the right moment in a business process. LLMs make that kind of tailoring trivial at scale.

A 2024 study by Heiding, Lermen, Kao, Schneier and Vishwanath evaluated fully automated spear phishing campaigns validated on human subjects. In their experiment, AI-generated emails performed on par with human experts and far better than a generic phishing control group. The paper is worth reading in full because it quantifies the problem and makes the attacker economics tangible.

At the same time, organizations are creating a new set of soft targets by wiring LLMs into internal knowledge bases, ticketing systems and workflow tools. If an attacker can induce prompt injection through a user-controlled input, a document in a shared repository or a compromised web page that the assistant is allowed to read, the model can become a conduit for data leakage or unsafe actions.

That is why I treat LLM security as both an offensive and defensive discipline. You are defending your organization from LLM-enabled threats and you are defending your own LLM-enabled systems from being turned against you. The UK Department for Science, Innovation and Technology has mapped vulnerabilities across the AI lifecycle from design through maintenance, which is a useful mental model for security teams.

To keep it actionable, I group attacker opportunities into three buckets:

LLMs as persuasion engines: better pretexts, better multilingual scams, better impersonation and better fraud scripting

LLMs as productivity engines: faster iteration on commodity malware, scripts, recon reports and exploit adaptation

LLMs as targets and pivots: stealing prompts, extracting system instructions, poisoning data sources or manipulating tool-using agents

The defensive implication is uncomfortable: some of your existing controls matter more than ever, especially verification, identity proofing and process hardening. If an executive approval workflow can be bypassed with a convincing message, an LLM will help attackers find and exploit that weakness faster.

LLMs also complicate content-based detection. When attackers can generate clean language, I put more weight on technical and behavioral signals (sender authentication, unusual login, anomalous payment request pattern) than on tone or grammar cues.

A control stack that lets you use LLMs without losing the plot

I do not think organizations need a brand-new governance regime for LLMs. They need to apply existing governance muscle to new failure modes. The closest fit is risk management, not model worship.

A helpful anchor is the NIST Artificial Intelligence Risk Management Framework, which organizes activities into govern, map, measure and manage. The value is less in the labels and more in the discipline: define accountability, understand context and impacts, measure risk and then operationalize mitigations.

If I were advising a security and risk committee on an LLM program, I would propose the following control stack. It is intentionally pragmatic and it assumes a mixed environment of third-party models, internal data and tool integrations:

Govern: Define what the model is allowed to do, who owns it and what unsafe means in your context (data classes, regulated workflows, critical decisions)

Map: Document the end-to-end system, including prompts, data sources, retrieval pipelines, plugins and downstream actions; not just the model endpoint

Secure the data: Inventory training, fine-tuning and retrieval corpora, enforce access controls and monitor for poisoning, leakage and unauthorized reuse

Threat model with AI-specific techniques: Map likely adversary behaviors using MITRE ATLAS and include prompt injection and tool abuse in your scenarios

Build guardrails at the boundaries: Input validation, content filtering, output constraints and schema-based parsing for any output that feeds automation

Gate high-impact actions: Require explicit approvals, step-up authentication or policy engine checks before an agent can change state in production

Test like you mean it: Red-team the assistant, run jailbreak and prompt injection suites and measure error rates under realistic loads

Instrument, monitor and respond: Log prompts, tool calls and outputs, detect anomalous usage patterns and maintain a kill switch for unsafe automation

Further steps

Two pieces of public guidance are worth reading because they translate secure AI into operational steps. First, “Guidelines for secure AI system development” — a document published by UK National Cyber Security Centre (NCSC), the US Cybersecurity and Infrastructure Security Agency (CISA) — provides lifecycle guidance from secure design through secure operation and maintenance. Second, “Deploying AI systems securely” — published jointly by the US National Security Agency’s Artificial Intelligence Security Center (AISC) and the Cybersecurity and Infrastructure Security Agency (CISA), along with other national and international agencies — focuses on best practices for deploying and operating externally developed AI systems.

Finally, I treat excessive agency as a line you cross deliberately. The more autonomy you give an LLM-based agent, the more you are building a new class of privileged software. If you would not give a junior script unfettered API access, you should not give it to a probabilistic model either.

The convergence of LLMs and cybersecurity is not optional. Attackers will use these tools and employees already are. The advantage comes from capturing productivity gains while keeping control of data, identity and change management.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

How to make LLMs a defensive advantage without creating a new attack surface

How LLMs are changing the work of security teams

How attackers are using the same capability to scale and personalize

A control stack that lets you use LLMs without losing the plot

Further steps

No Responses

Leave a Reply Cancel reply