Is Claude Mythos All It’s Hyped Up to Be? The UK Gov’t Put It to the Test

The UK government tested Claude Mythos to see whether its much-hyped cyber skills are a real breakthrough or just a modest step forward

4
Is Claude Mythos All It's Hyped Up to Be? The UK Gov't Put It to the Test
Security, Compliance & RiskNews

Published: April 15, 2026

Kristian McCann

Anthropic’s latest model release, Mythos Preview, has been positioned as a step change in AI capability for cybersecurity-related tasks. Unlike earlier frontier models, it has been described as particularly strong at handling structured, multi-stage technical problems that resemble real-world intrusion scenarios.

That framing alone was enough to get tech leaders like Microsoft, Apple, and AWS on board the Glasswing project – where Mythos is being tested – with the closed use of the model generating rave reviews from the companies.

Yet such fervor surrounding the project has also triggered interest from security researchers and policymakers. The UK government’s AI Security Institute (AISI) has announced it has run an independent evaluation of the model, with the aim of understanding whether Mythos genuinely represents a new class of cyber capability, or whether it simply sits in line with recent frontier systems from companies like Anthropic and other leading AI developers.

Early findings suggest a more nuanced picture than the hype might imply.

What Was Actually Tested

The evaluation from the UK’s AI Security Institute builds on a long-running benchmarking program that uses Capture the Flag (CTF) challenges to measure AI systems against cybersecurity-style problems. These tests range from relatively simple exploitation exercises through to more complex, multi-step scenarios designed to mimic real intrusion workflows.

According to AISI, Mythos Preview reaches a new high point on its β€œApprentice” level tasks, completing more than 85% of challenges in that category. However, this performance is broadly comparable to other frontier models, including GPT-5.4, Opus 4.6, and Codex 5.3, which all sit within a similar range of accuracy across multiple difficulty tiers.

While Mythos does not dramatically outperform its peers on isolated tasks, its behavior across extended sequences of actions raises more interesting questions about how far AI systems are progressing in practical offensive cyber capability.

The more significant part of the evaluation focuses on a test environment called β€œThe Last Ones” (TLO). This simulation was designed to model a 32-step data extraction attack across a fictional corporate network. It requires chaining together multiple actions across different systems, mirroring the kind of sustained effort that would typically take a skilled human operator many hours to complete.

In this environment, Mythos showed clearer differentiation. It was the first model to fully complete the TLO challenge end to end, although only in a minority of attempts. On average, it completed 22 out of 32 steps per run, compared with a lower baseline from earlier models such as Claude 4.6, which averaged around 16 steps. AISI also noted, however, that Mythos still struggles with more advanced scenarios such as the β€œCooling Tower” test, which simulates disruption of industrial control systems.

Breakthrough or Incremental Progress?

On paper, Mythos does not represent a dramatic leap in raw cyber task performance when compared to other leading models. On isolated tasks, it is broadly aligned with systems like GPT-5.4 and Anthropic’s own recent releases.

Where Mythos becomes more interesting is not in what it can do, but how it does it. AISI highlights its ability to chain multiple partial successes into longer sequences of action. In practice, this means it can recover from failures, adjust strategy, and continue progressing through a complex, multi-stage attack path rather than stalling after an initial obstacle.

That capability is what led AISI to conclude that Mythos may already be capable of autonomously targeting small, weakly defended enterprise environments once initial access is achieved. However, the institute is careful to stress that its test environments are simplified compared to real-world systems, lacking active defenders, detection tools, and unpredictable security responses.

This raises the central question: even if Mythos is not dramatically more powerful in isolated tasks, does its improved β€œend-to-end reasoning” change the threat landscape in a meaningful way? And more importantly, will systems like this eventually cross the threshold that transforms AI from a passive tool into an autonomous cyber operator?

For now, the answer is unclear. But the direction of travel is becoming harder to ignore.

What Next for AI and Cyber Defense?

The AISI report ultimately stops short of declaring Mythos a breakthrough in offensive cyber capability, but it does signal an important shift in how these systems should be evaluated.

The focus is no longer just on whether an AI can solve individual security problems, but whether it can sustain coherent action across many steps in a realistic attack chain.

That distinction matters. Even if current models remain inconsistent or unreliable, improvements in multi-step reasoning could compound quickly as compute scales and training methods evolve. AISI itself suggests that stronger performance might emerge simply through increased inference compute and further optimization.

At the same time, the report also acknowledges a major limitation: these results are drawn from controlled simulations, not real enterprise environments. Real-world systems include monitoring, mitigation, and human intervention layers that are difficult to replicate in benchmark tests. That makes it hard to draw firm conclusions about actual exploitability at scale.

Still, the trajectory is clear enough to warrant attention. As models like Mythos continue to improve, the defensive side of cybersecurity may need to evolve in parallel. AISI suggests that organizations should begin exploring the use of AI systems not just for attack simulation, but for strengthening defensive architectures and identifying vulnerabilities before they can be exploited.

Agentic AIAgentic AI in the Workplace​AI AgentsCall RecordingCloud Security Posture ManagementCollaboration SecurityCommunication Compliance​Security and Compliance
Featured

Share This Post