Why your AI strategy stops where the PLC starts: Hard lessons from the OT frontlines

Tags:

I spent two days at a substation connecting a major offshore wind farm to the grid. The control room featured three new AI-ready dashboards and a board mandate to “leverage machine learning for resilience.” It also had a maintenance laptop running Windows 7, literally taped to the inside of a cabinet because the Velcro had failed.

That laptop was the only device in the building that could still talk to the legacy protection relays guarding the grid connection. No patches since 2017. No EDR. No path to an agent-based security model.

I have walked into some version of this scene at energy utilities, automotive plants and pharma sites across sectors and borders for a decade. The dashboards change; the “forgotten” laptop stays. This is the massive visibility gap that no Large Language Model can close. According to the 2026 Dragos OT Cybersecurity Year in Review, fewer than 10 percent of OT networks worldwide currently have meaningful network monitoring in place. In 30 percent of last year’s incident response cases, investigations started not with a detection alert, but with someone on the plant floor noticing that “something seemed wrong.”

If you are a C-level leader planning an AI-driven security strategy, you need to realize: your strategy won’t fail because the AI isn’t smart enough. It will fail because your most critical telemetry never reaches it.

The inverted CIA triad: Where AI hallucinates risk

In IT, we prioritize confidentiality, integrity and availability. In OT — operational technology — the triad is flipped: availability is everything.

This inversion is where AI-driven security tools quietly break. A model trained on enterprise telemetry — HTTP, DNS and Windows event logs — will look at a Modbus or PROFINET segment and flag perfectly normal industrial traffic as an anomaly. If that AI is wired into an automated response playbook, you’ve built a system that can shut down a production line faster than any hacker.

During a simulation I conducted for a Tier-1 automotive supplier, I watched a SOAR platform attempt exactly this. The IT lead was thrilled by the “millisecond response time.” The plant manager went gray as he realized the AI had just simulated a six-figure-per-hour downtime event by isolating a critical PLC. In the industrial world, an automated “isolate host” command is often indistinguishable from a denial-of-service attack.

Passive monitoring vs. poking the controller

When I have evaluated OT monitoring platforms like Nozomi Networks, Claroty or Microsoft Defender for IoT, the technical differences often mattered less than one critical question: does this tool require active queries?

In a boardroom, “active scanning” sounds efficient. In a running plant, poking a 15-year-old Siemens S7-300 or a Rockwell Automation controller to extract metadata can cause the device to crash. I’ve seen half a shortlist eliminated because the vendors’ AI engines required active polling that the operations director refused to sign off on.

For AI to work in OT, it must be fed by passive network monitoring. You need the raw traffic from Level 0–2 of the Purdue Enterprise Reference Architecture, the layered model that defines the boundary between IT and OT systems. Without that telemetry, you are performing language modeling on an empty corpus. You aren’t seeing the S7Comm or DNP3 protocols that actually manage the physical world.

The crown jewels are simpler than you think

The projects I see succeed don’t start with a 300-page AI roadmap. They start with a ruthless focus on what I call the crown jewels.

I always ask plant managers the same question: which three processes can you absolutely not afford to lose for even an hour? At a power utility, it’s not the billing system; it’s the protection relays. At a pharma site, it’s a single fermentation line. At an automotive plant, it’s the welding cell that feeds the entire body shop.

Once you identify these, the AI scope collapses from “everything” to “the things that matter.” We then apply virtual patching to protect the unpatchable Windows 7 machines and segment the network so the smart coffee machine in the breakroom — which receives more security updates than the industrial robots — cannot reach the human-machine interface.

Here is the part that surprises most CIOs: the crown-jewel list is almost always shorter than the security team predicts and almost always longer than the operations team admits. At one site I worked with last year, security had counted 47 “critical” systems on a spreadsheet. The plant director, after twenty minutes of honest conversation, named six. The other 41 were important, but they were not crown jewels. They didn’t need real-time AI-driven anomaly detection. They needed monthly compliance reporting. Conflating those two requirements is how OT security budgets get burned without measurable risk reduction.

The culture shift: From phishing to physics

The most productive workshop I ran this year didn’t involve a single AI vendor. It was a tabletop exercise tracing a ransomware path from a phishing email to a contractor’s USB stick, then into the maintenance laptop and finally the PLCs.

We mapped it minute by minute. Minute zero: a procurement clerk opens an invoice attachment. Minute eight: the malware reaches the contractor’s laptop on the office network. Minute fourteen: the contractor plugs the same laptop into the maintenance VLAN to update HMI firmware, just as he does every Tuesday. Minute twenty-three: the ransomware encrypts the engineering workstation. Minute thirty-one: the operators notice the screens going dark, but production keeps running on the PLCs themselves — because OT controllers don’t need Windows to do their job. The illusion of normality holds for almost an hour. Then someone tries to push a setpoint change, and nothing happens.

That was the moment that changed the room. The production head had spent the morning asking why we needed yet another security project. Now he was asking how long until they could actually detect minute eight, before the contractor’s laptop ever touched the maintenance network. The IT lead, who had defended his “patch Tuesday at 2 a.m.” ritual for years, finally understood why that ritual is an impossibility in a facility that runs 24/7. Different vocabularies, same problem.

For the first time in any meeting at that site, an OT manager and an IT manager left the room with a shared incident timeline rather than a shared blame map. That’s what culture change in industrial security actually looks like — not a policy document, but a tabletop with enough specificity that nobody can hide behind their own jargon.

Bottom line for CIOs and CSOs

With nation-state actors like Volt Typhoon increasingly using “living off the land” techniques to embed themselves in critical infrastructure, as detailed in recent CISA advisories, the luxury of ignoring the factory floor is gone. AI can help us find these threats, but only if the telemetry is real. If you want AI to deliver real business value in industrial environments, the order of operations is non-negotiable.

First, inventory: map the floor, not the slides. Second, segmentation: kill the routes from the breakroom to the PLC. Third, passive telemetry: feed the AI with real industrial protocols from Purdue Levels 0–2. Then, and only then, layer the language model on top.

Skip these, and you’ve built a very expensive dashboard for a network you still cannot see.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

Categories

No Responses

Leave a Reply

Your email address will not be published. Required fields are marked *