Claude AI Cybersecurity Image

When AI Becomes the Attacker: Why the OT Security Arms Race Ends at the Hardware Layer

Partager

Table des matières

In 2025, the baseline assumption of industrial cybersecurity broke. 

For twenty years, defenders had one reliable edge over attackers: time. A skilled adversary targeting a substation or a water treatment plant needed weeks of manual reconnaissance, specialist knowledge of industrial protocols, and a team coordinated enough to sustain a multi-stage campaign. That preparation time was never a security architecture, but it was real, and detection strategies were built around it. In September 2025, a Chinese state-sponsored group ran a full espionage campaign across 30 organizations with an AI agent handling 80 to 90% of operations at thousands of requests per second. Human operators checked in at four to six decision points and let the machine run the rest. Four months later, Anthropic’s Mythos model found a 27-year-old vulnerability in OpenBSD, a system built around security, along with thousands of other zero-days across every major OS and browser, working without human input after a single initial prompt. For OT operators running PLCs on firmware from 2012, the math has changed. Your detection window may be shorter than the time it takes to open a ticket.

From Human Attackers to Autonomous Agents: A Three-Year Escalation

AI as assistant: the vibe hacking phase

Twenty years ago, the operational security model for critical infrastructure rested on a reasonable bet: attackers are human, and human operations take time. Planning an intrusion into a power generation facility required weeks of reconnaissance, domain knowledge of industrial protocols, and a team capable of sustaining a multi-stage campaign. That friction bought defenders time to detect, investigate, and respond.

The first signs that this was changing were visible, but gradual. In 2023, AI-assisted phishing began generating contextually accurate spear-phishing at scale, at effectively zero marginal cost per target. By early 2025, Anthropic’s Threat Intelligence team was documenting what it called « vibe hacking »: campaigns where AI assisted human operators in real time, accelerating each step while humans remained in the loop directing operations. By mid-2025, the autonomous system XBOW had taken the top ranking on HackerOne’s leaderboard, above every human researcher on the platform. Each of these was a data point on a curve. September 2025 was a different category of event.

GTG-1002: the first fully autonomous campaign

A Chinese state-sponsored group designated GTG-1002 ran what Anthropic confirmed as the first large-scale cyberattack executed with AI handling the majority of operations without human direction. The threat actor jailbroke Claude Code, built an autonomous attack framework around it, and ran reconnaissance, vulnerability analysis, exploit generation, credential harvesting, lateral movement, and data categorization across approximately 30 organizations. Human operators intervened at four to six decision points per campaign. The agent ran at thousands of requests per second. No human team operates at that tempo.

The attack did not rely on novel malware or rare expertise. The framework used commodity open-source penetration testing tools, orchestrated autonomously through Claude Code via the Model Context Protocol. GTG-1002 demonstrated that AI-speed campaigns are accessible: the barrier is not technical sophistication, it is access to a capable model and a functional jailbreak.

Mythos Preview: autonomous zero-day discovery at scale

In April 2026, Anthropic released the findings from Claude Mythos Preview, a model it has chosen not to make publicly available precisely because of what it can do. Over several weeks of internal testing, Mythos found thousands of zero-day vulnerabilities across every major operating system and browser, including a 27-year-old integer overflow in OpenBSD and a 16-year-old flaw in FFmpeg that had survived five million automated fuzzing runs. In Firefox’s JavaScript engine, Mythos converted 72.4% of identified vulnerabilities into working exploits, against 14.4% for its predecessor. In several cases it chained four vulnerabilities into a single browser exploit with no human guidance after the initial prompt. Over 99% of what it found remains unpatched.

Anthropic’s own conclusion: « The barriers to performing sophisticated cyberattacks have dropped substantially, and we predict that they’ll continue to do so. »

Why OT Environments Are Disproportionately Exposed

An asset base that cannot be patched

GTG-1002 hit technology companies, financial institutions, and government agencies. These organizations run active patching cycles and maintain SOC teams calibrated to respond within hours. OT environments work under different constraints, and those constraints sit directly in the path of what AI-speed attacks exploit most effectively.

PLCs and RTUs in power generation, water treatment, and process manufacturing routinely run firmware that has not been updated in five to fifteen years. Not because operators are careless: the vendor no longer supports the device, updating requires a controlled process shutdown, or the patch has not been validated against the production configuration. CISA’s ICS advisories regularly document vulnerabilities in devices that operators cannot patch for exactly these reasons. Engineering workstations on end-of-support Windows versions are common. SCADA servers and historians frequently sit on insufficiently segmented networks.

Protocols with no authentication layer

Modbus, DNP3, Profibus, and IEC 61850 were designed for determinism and reliability. Authentication was not part of the brief. A device speaking Modbus responds to any poll from any host on the same segment: no credential required, no log entry generated. An agent that has reached an engineering workstation can enumerate every reachable device, read register maps, and document process variable addresses without triggering a single authentication event.

GTG-1002’s reconnaissance loop ran against IT networks in hours. Against OT assets with no mechanism to log or reject a query, the same loop runs faster and produces cleaner output. The absence of authentication does not just simplify the attack; it removes the telemetry that detection systems depend on.

A patching horizon measured in decades, not months

Mythos found a flaw in FFmpeg that had been in production for sixteen years and survived five million fuzzing attempts. A PLC firmware stack last reviewed in 2012 carries risks of the same order, and in many cases the manufacturer no longer exists to issue a fix. CrowdStrike’s 2026 Global Threat Report put average time-to-exploit at five days, down from thirty in 2022. An unpatched OT asset is not a background risk at that speed. It is an open position.

The historical record confirms the consequence. The 2015 and 2016 Ukrainian grid attacks required weeks of reconnaissance before operators opened circuit breakers across multiple substations. TRITON in 2017 needed specialists who understood Safety Instrumented System architecture. FrostyGoop in 2024 exploited an exposed Modbus interface to cut heating to 600 apartment buildings in Lviv, but still relied on human operators who knew which registers to target. Every one of those campaigns had a preparation phase measured in weeks. AI compresses that phase to hours, eliminating the detection windows that post-incident analysis identified as missed opportunities in each case.

The Structural Failure of Software-Based Defense in OT

Why the escalation cycle favors the offense

The standard response to AI-enabled attacks is to deploy AI-enabled defenses: behavioral anomaly detection, ML-based IDS, AI-assisted SOC triage. These tools have value. The problem is that they keep the OT environment inside an escalation cycle where the attacker moves first and the defender responds, and AI has compressed the gap between those two events below the threshold where human-in-the-loop response can reliably act.

GTG-1002 demonstrated the mechanism. The attack framework decomposed every malicious operation into thousands of individually innocuous-looking tasks, each framed as legitimate security testing activity. Claude executed them without seeing the aggregate intent. Detection triggered eventually, but after successful intrusions had already been completed. The same decomposition technique that bypassed Claude’s own safety guardrails works against behavioral classifiers for the same reason: the classifier sees individual steps, not the campaign. Evasion adapts faster than detection rules update.

Anthropic’s Red Team stated the implication plainly: « Language models like Mythos Preview might require reexamining defense-in-depth measures that make exploitation tedious rather than impossible, since language models can grind through these tedious steps quickly. » Any defense whose effectiveness depends on the attacker running low on patience or budget is now miscalibrated.

What AI-speed intrusion looks like on an OT network

In OT, this failure mode has a concrete form. An IDS flagging anomalous Modbus traffic detects lateral movement after it has started. It does not stop an agent from completing a full network topology map in the seconds before a human analyst opens the case. A SIEM correlating authentication events across IT and OT can identify credential reuse after the fact, not before the agent has staged those credentials for exfiltration.

Detection-based defenses shorten dwell time. Against an agent that achieves its reconnaissance objectives in minutes, shorter dwell time still means the damage is done before the first alert escalates to a human who can act on it. The detection window that OT security architectures were calibrated around has not narrowed. It has closed.

Hardware Segmentation: Exiting the Arms Race

What a data diode does at the network boundary

The alternative to fighting AI-speed attacks with faster detection is to remove the target from the equation at the network boundary.

A hardware data diode enforces one-way communication at the physical layer. An agent operating from a compromised IT asset can send packets toward the OT segment and receive nothing back. No device inventory. No register maps. No protocol responses to analyze. The reconnaissance loop that GTG-1002 ran at thousands of requests per second returns zero useful output at a hardware boundary. Topology mapping requires responses. With no responses, there is no map, no lateral movement path, and no pre-positioned capability for physical disruption. This is not a detection rule or a behavioral classifier. It is a hardware constraint with no software equivalent and no patch cycle.

Preserving OT visibility without a return channel

A one-way boundary does not mean dark networks. Historian replication of process values, batch records, and quality data flows outbound to enterprise analytics. Security events from PLCs, HMIs, and OT network infrastructure forward to the SOC SIEM. SCADA alarm exports reach the enterprise. The data that makes OT operations visible to the business and to security teams crosses the boundary in the direction it needs to go.

What does not cross is any inbound path. An attacker who has fully compromised the enterprise network still cannot reach the DCS, the safety instrumented system, or the historian. For environments that also need limited IT-to-OT flows, firmware updates, configuration pushes, and recipe synchronization in discrete manufacturing, paired unidirectional gateways in a 1.5-way architecture handle those transfers through asynchronous, file-based, hardware-enforced transfers. Each transfer is discrete and logged. There is no persistent session an agent can maintain or exploit across the boundary.

Standards alignment: IEC 62443, NIST SP 800-82, ANSSI

The standards frameworks converge on this architecture. IEC 62443-3-3 recognizes unidirectional security gateways as a compliant control for enforcing zone separation at the highest security levels. NIST SP 800-82 and ANSSI’s industrial security guidance both reference hardware data diodes for environments where bidirectional IT-to-OT access is operationally unjustifiable. Compliance requirements and threat response point to the same architecture, which simplifies the business case for operators who need to satisfy both.

Conclusion

GTG-1002 confirmed that AI-orchestrated attacks at machine speed are operational, not theoretical. Mythos Preview confirmed that autonomous zero-day discovery now works faster than human patch cycles, against vulnerabilities that have been sitting in production systems for decades. Anthropic’s own assessment is that the capability floor for sophisticated attacks will keep dropping, and that the tools to replicate what GTG-1002 did are already proliferating to less resourced actors.

Detection and response capabilities matter. For OT environments running legacy assets on unauthenticated protocols, they are not a complete answer. Hardware segmentation at the OT boundary removes the network from an attack that depends on bidirectional access to map, move, and pre-position. The architecture that does not need to detect an AI-orchestrated intrusion at machine speed, because the intrusion has no viable path, is the one built around a physical constraint rather than a faster classifier.

Learn more about Cyberium unidirectional gateways and cross-domain architectures designed for OT segmentation and secure data transfer.

More posts