Data analytics platforms and the information they contain are among the most important corporate resources CISOs are charged with protecting, but data analytics can also be an effective tool for helping security teams identify and mitigate risks.
With artificial intelligence (AI), machine learning (ML), and data science constantly advancing in their capabilities, cybersecurity chiefs can pinpoint the signs of attacks like never before. And that can help their teams initiate mitigation more quickly.
“Security today is as much about smart data use as it is about traditional defenses,” says Timothy Bates, a professor of AI, cybersecurity, and other technologies at the University of Michigan College of Innovation and Technology, and former CISO at General Motors. “Data science and machine learning gave us the context and timing to act before incidents escalated.”
When Bates worked for General Motors, one of the auto manufacturer’s most impactful initiatives was architecting a global security operations center (SOC) to shift from a reactive to a proactive cybersecurity posture. The company used intrusion detection tools and a security information and event management (SIEM) platform to aggregate and analyze logs across a complex, distributed infrastructure.
“Through data analytics, we processed billions of log events daily, creating behavioral baselines that allowed us to detect anomalies in real-time,” Bates says. “One notable case involved identifying unusual login and command-line activity patterns within our manufacturing networks. That insight allowed us to stop a credential-stuffing attack before it reached critical systems, preventing what could have been a multimillion-dollar incident.”
AI, ML, data science “are an enormous help with large data sets, which cybersecurity is packed full of,” says Nick Kathmann, CISO and CIO at governance, risk, and compliance provider LogicGate. “While core benefits are still under development, the immediate uses are already bearing fruit when combining those huge security datasets [with] risk management.”
Just having security data pouring in and deploying AI and analytical tools doesn’t guarantee success, however. Enterprises and their security teams need to adhere to best practices.
Here are some tips for getting the best results from leveraging data for cybersecurity.
Deploy machine learning for deep pattern recognition analysis
One good practice is to pair a SIEM platform with ML models to analyze patterns across billions of daily log entries, Bates says. “Build behavioral baselines across business units, then flag deviations in real-time,” he says. “Logs alone don’t tell you what’s wrong. Patterns do. Machine learning gave our SOC superpowers — turning noisy data into action-ready insight.”
That deeper analysis proved vital to thwarting the credential-stuffing attack at GM, Bates notes. “The activity mimicked internal [administration] behavior — but just off enough for our system to flag it,” he says.
At BairesDev, ML data analysis offers the opportunity to spot threats and unusual activity more quickly.
“It uses your network traffic, user behavior, and device activity to learn about you and define what’s normal,” says Pablo Riboldi, CISO at the nearshore software development company. “Then, it flags any suspicious activity in real-time. This early detection helps security teams get ahead of insider threats, compromised accounts, or attackers moving within the network before they can do real harm.”
ML tools can help identify phishing attempts, even sophisticated ones that might slip past regular filters, Riboldi says. “Over time, these systems get better,” he says. “This leads to fewer false alarms and more focus on actual threats. As not all security weaknesses are the same, machine learning can help prioritize those vulnerabilities that are a threat for the business.”
Emphasize the ‘learning’ part of ML
To be truly effective, models need to be retrained with new data to keep up with changing threat vectors and shifting cyber criminal behavior.
“Machine learning models get smarter with your help,” Riboldi says. “Make sure to have feedback loops. Letting analysts label events and adjust settings constantly improves their accuracy. Also, the data you give them is key. It needs to be good, secure, and come from different sources, like your computers, the cloud, login systems, etc.”
Building a well-integrated data lake or SIEM platform ensures that the ML models have context-rich data to work with, Riboldi says.
“Don’t just monitor known bads — train your models to recognize when something’s ‘not quite right,’ even if it’s never been flagged before,” Bates says. “The most dangerous attacks don’t trip the typical wires. It’s the subtle shifts — logins at odd hours, a dev script being run from an unexpected host — that often point to breach activity.”
Fuse data science into your security team
At many enterprises, data science/analytics and cybersecurity teams are separate entities. But it’s a good idea to blend the SOC team with data scientists who understand the corporate infrastructure and can tune models based on overall context rather than just generic patterns, Bates says.
“Cybersecurity is no longer just about firewalls and antivirus,” Bates says. “It’s a data game now. Marrying cyber expertise with data modeling gave us the precision we needed at GM to act in real-time — not post-mortem.”
Organizations with data science teams that work alongside security teams “will be leaps and bounds ahead of organizations dependent on vendors to incorporate the tooling,” LogicGate’s Kathmann says.
“Especially in the interconnected and vendor-agnostic world we live in now, collaboration between accountable teams is key,” Kathmann says. Having
a data science team understand the end goals of the organization, and then collaborate with a security team to facilitate the collection and storage of data in a data warehouse or data lake, is the best approach, he says.
Ensure top-quality data governance and integration
“To get the most cybersecurity value out of data and AI capabilities, organizations should focus on ensuring data quality and integrating across data sources,” says Anay Nawathe, director at global technology research and advisory firm ISG.
“Organizations should consistently cleanse, normalize, and validate data as appropriate, to increase accuracy of the findings and minimize model drift,” Nawathe says.
Data integration across diverse data sources enables cybersecurity teams to receive more context around any given trend or anomaly, which leads to richer insights into complex threats, Nawathe says.
Along the same lines, organizations need to integrate threat detection across the business — not just the perimeter.
“Ensure your SOC integrates deeply into operational environments like operational technology networks and cloud systems,” Bates says. “Threat actors know the gaps; don’t let your factory floor or [development] pipeline be one of them.
This is important because cyberattacks often hide in overlooked places, such as legacy systems, remote plants, or software development operations, Bates says. “Real-time visibility across these zones helped us shut down threats before they became disasters,” he says.
Supplement security with custom-trained LLMs
A large language model (LLM) that has been customized to meet the specific needs of an organization can help enhance cybersecurity.
“Some organizations with sophisticated cyber teams, unique security requirements, or complex environments are increasingly using customized solutions for their security analytics, though they will likely remain in a hybrid custom vs. commercial-off-the-shelf model,” Nawathe says.
Some of these custom use cases are “data/risk visualization” or risk quantification initiatives that are highly specific to the organization, Nawathe says.
By custom-training an LLM and using it to process and correlate raw sensor and log data, a much cleaner and more concise data feed can be sent to mainstream security tools, says Christopher Walcutt, CSO at security services provider DirectDefense.
“In addition, SOC staff can experiment in real-time, using the AI to teach them how to write better queries while providing the AI additional contextual learning,” Walcutt says. “The resulting metadata can be transformational
[and] allow for more advanced automation of defensive actions.
Custom-trained LLMs can power AI for a number of discrete functions, one of the best being the preprocessing of event and log data, Walcutt says. AI will be able to identify groupings of behaviors that a heuristic or rules-based machine learning or other solution will be unable to detect, he says, “and in doing so, make the fidelity of data feeding the other tools much higher.”
Make full use of documentation by mining it with AI
Analysis of unstructured data can also reap significant rewards for cybersecurity teams. For example, AI can have a big impact on mining company documentation, including the records used to manage and secure the organization’s systems. This includes policies, procedures, and other documents that guide the organization’s cybersecurity practices.
Documentation is also a vital component of the regulatory compliance function at enterprises, providing a framework for security controls.
“Reading, summarizing, and creating documentations [has] never been easier,” LogicGate’s Kathmann says. For example, security professionals can leverage AI models to read and summarize the key differences in risk frameworks and risk analysis reports, she says.
“Leaders can also create a model to search through all of an organization’s SOPs [standard operating procedures] and look for specific known or suspected bad practices, identify processes that do not follow standards, or read through vendor security documents and reports,” Kathmann says.
No Responses