OpenAI’s GPT-5.5 Rivals Anthropic’s Claude in Offensive

<a href="https://openai.com/" target="_blank" rel="noopener">OpenAI</a>’s GPT-5.5 Rivals <a href="https://www.anthropic.com/" target="_blank" rel="noopener">Anthropic</a>’s <a href="https://claude.ai/" target="_blank" rel="noopener">Claude</a> in Offensive Cyber Operations, UK Study Finds

Assessment Overview

OpenAI’s latest language model, GPT-5.5, has demonstrated offensive cyber capabilities comparable to Anthropic’s Claude Mythos, according to a recent evaluation by the AI Security Institute (AISI). The UK-based research organization, operating under the Department of Science, Innovation and Technology, released its findings on Thursday, highlighting the accelerating pace of AI-driven cybersecurity threats.

Benchmark Performance and Technical Milestones

During the evaluation, GPT-5.5 successfully executed a highly complex, 32-step corporate network intrusion simulation known as “The Last Ones” in two out of ten trials. The exercise, developed in partnership with cybersecurity firm SpecterOps, demands that an AI agent perform reconnaissance, steal credentials, navigate multiple Active Directory environments, exploit a supply-chain pipeline, and extract a secured internal database. AISI estimates that a skilled human professional would require roughly 20 hours to complete the same sequence.

In a separate technical challenge, the model reconstructed a custom virtual machine’s instruction set, authored a disassembler from the ground up, and recovered a cryptographic password using constraint solving. GPT-5.5 finished this task in 10 minutes and 22 seconds at a cost of $1.73 in API fees. By contrast, a human specialist utilizing industry-standard tools needed approximately 12 hours to achieve the same result.

On the institute’s most rigorous “Expert” tier of advanced cybersecurity tasks, GPT-5.5 secured an average pass rate of 71.4 percent. This performance slightly surpassed Claude Mythos Preview, which scored 68.6 percent, and marked a substantial leap over GPT-5.4’s 52.4 percent success rate.

Safety Protocols and Vulnerability Findings

Despite its technical proficiency, the report raised serious alarms regarding the model’s security protocols. Researchers uncovered a universal jailbreak technique that successfully bypassed GPT-5.5’s defensive measures across every malicious cyber query tested, including in multi-turn agentic scenarios. Developing this exploit required six hours of dedicated red-teaming by experts. While OpenAI later deployed an updated safety stack, a configuration problem prevented AISI from confirming whether the new measures successfully neutralized the vulnerability.

The institute emphasized that these evaluations were conducted within a restricted research framework and should not be interpreted as indicative of what standard users can access, noting that commercial releases incorporate additional access controls and protective layers.

Broader Implications and National Cybersecurity Context

AISI cautioned that the rapid enhancement of offensive cyber skills may reflect a broader pattern in artificial intelligence development rather than a singular anomaly. The organization warned that if advanced reasoning, coding proficiency, and autonomous task execution continue to improve, breakthroughs in cyber capabilities could emerge in rapid succession.

These findings coincide with concerning domestic cybersecurity trends. The UK government’s annual Cyber Security Breaches Survey, also released on Thursday, revealed that 43 percent of businesses experienced a cyber breach or attack over the previous year. In response, officials unveiled £90 million in new funding aimed at strengthening national cyber resilience and advanced the Cyber Security and Resilience Bill to safeguard critical infrastructure. Government advisors also issued new guidelines urging organizations to brace for a potential spike in newly discovered software vulnerabilities, as AI tools accelerate both the discovery and exploitation of security flaws.

Written by

Hue

The girl with pink hair, usually arguing about GPU benchmarks or checking her crypto portfolio between gaming sessions. She writes about PC tech, games, and crypto.

OpenAI’s GPT-5.5 Rivals Anthropic’s Claude in Offensive Cyber Operations, UK Study Finds

Assessment Overview

Benchmark Performance and Technical Milestones

Safety Protocols and Vulnerability Findings

Broader Implications and National Cybersecurity Context