Anthropic's safety warnings may have just backfired — the government has pulled the plug on its most powerful AI

Key Takeaways

Anthropic has disabled its most advanced AI models, Fable 5 and Mythos 5, globally following a U.S. government export control directive.
The government order cited national security concerns over a "narrow potential jailbreak" that could allow the models to identify software flaws.
Anthropic launched Fable 5 and Mythos 5 just three days before the directive, with Fable 5 being publicly available and Mythos 5 restricted to vetted cybersecurity defenders.
The company disagrees with the government's assessment, arguing that the jailbreak is minor and similar capabilities exist in other publicly available models like OpenAI's GPT-5.5.

In an unprecedented move that has sent ripples through the artificial intelligence community, Anthropic, a leading AI safety and research company, has "abruptly disabled" its two most powerful AI models, Claude Fable 5 and Claude Mythos 5, for all users worldwide. This dramatic action came in response to an export control directive issued by the U.S. government, citing national security concerns over a reported "narrow potential jailbreak" in the models.

The incident highlights the escalating tension between rapid AI development and government oversight, particularly as AI capabilities advance into areas with significant national security implications. Anthropic, known for its emphasis on AI safety and its "Constitutional AI" approach, finds itself in a challenging position, having advocated for government regulation of dangerous AI while now disputing the application of such controls to its own technology.

The Launch and the Immediate Halt: Fable 5 and Mythos 5

Just three days prior to the government's directive, on June 9, 2026, Anthropic had proudly unveiled Claude Fable 5 as its most capable model made generally available to the public. Alongside Fable 5, the company also released Claude Mythos 5, an even more powerful version with cyber safeguards lifted, which was intended for a vetted group of cybersecurity defenders and critical infrastructure operators.

Fable 5 was designed with robust safety classifiers to route flagged requests, particularly in sensitive areas like cybersecurity, biology, and chemistry, to the weaker Claude Opus 4.8. Anthropic had stated that Mythos-class models possessed capabilities in finding and exploiting software vulnerabilities that necessitated careful deployment. The company had invested thousands of hours in red-teaming Fable 5 with internal teams and external organizations, including the UK AI Safety Institute, to identify and mitigate potential weaknesses.

However, the celebration of these new advancements was short-lived. On June 12, 2026, at 5:21 p.m. ET, Anthropic received an export control directive from the U.S. government. The order, issued by the Commerce Department, mandated the suspension of access to Fable 5 and Mythos 5 for any foreign national, whether inside or outside the United States, including Anthropic's own foreign national employees. Due to the inability to reliably distinguish foreign nationals from its global user base in real time, Anthropic made the decision to disable both models for all customers worldwide to ensure compliance.

The "Narrow Potential Jailbreak" at the Core of the Dispute

The government's directive, while not initially providing specific details in writing, was understood by Anthropic to stem from a concern over a "potential jailbreak" in Fable 5. A jailbreak, in this context, refers to a method of bypassing the AI model's safety guardrails to elicit potentially harmful outputs. Anthropic stated that the government had provided verbal evidence of a "narrow, non-universal jailbreak," which reportedly involved asking the model to read a specific codebase and identify software flaws.

Anthropic, however, strongly contested the severity and uniqueness of this finding. In a public statement, the company argued that the demonstrated technique identified only a small number of previously known, minor vulnerabilities. More critically, Anthropic claimed that other publicly available models, including OpenAI's GPT-5.5, possess similar capabilities without requiring a bypass and are routinely used by cybersecurity defenders. The company expressed its belief that the government's action was based on a "misunderstanding" and that recalling a commercial model deployed to hundreds of millions of people over such a narrow jailbreak was disproportionate.

"If this standard was applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers," Anthropic wrote in its statement, underscoring the potential chilling effect on AI innovation and deployment.

Anthropic's Safety Philosophy and the Backlash

Anthropic was founded by former OpenAI researchers who were concerned about the safety and ethical implications of advanced AI. The company has been a vocal proponent of "Constitutional AI," a method designed to train AI systems to be helpful, harmless, and honest by adhering to a set of written principles. This approach aims to provide a transparent framework for AI development and reduce the reliance on extensive human feedback for identifying harmful outputs.

Ironically, Anthropic's consistent warnings about the potential for catastrophic risks from powerful AI models, and its advocacy for government intervention, may have contributed to this situation. Anthropic CEO Dario Amodei has publicly stated that the government should have the legal authority to block dangerous AI deployments. However, the company also emphasized that such a process should be "transparent, fair, clear, and grounded in technical facts," principles it feels were not fully met in this instance.

This incident also comes against a backdrop of existing friction between Anthropic and the U.S. government. Earlier this year, the Pentagon had labeled Anthropic a "national security supply chain threat" after the company reportedly drew red lines over the military use of its technology, particularly regarding domestic surveillance and fully autonomous weapons systems. Anthropic had even filed lawsuits challenging this designation.

Broader Implications for AI Governance and the Industry

The government's directive marks a significant escalation in U.S. efforts to regulate and potentially restrict access to advanced AI models. Historically, export controls have focused more on the chips and tools powering AI, rather than the AI models themselves. This move sets a new precedent, directly targeting publicly deployed commercial AI products.

This event could have several far-reaching implications:

Increased Scrutiny and Regulation: Other AI developers may face heightened scrutiny before deploying new frontier models. The voluntary framework for AI developers to grant government access to models before release, as outlined in a previous executive order, may now become less voluntary.
Impact on Innovation and Deployment Speed: The industry's ability to rapidly innovate and deploy new AI capabilities could be hampered if companies fear similar abrupt recalls based on what they consider minor vulnerabilities. Anthropic's warning that such a standard would "halt all new model deployments" reflects this concern.
Trust and Transparency: The lack of specific, publicly detailed technical evidence for the "jailbreak" from the government's side, as claimed by Anthropic, could erode trust between AI developers and regulators.
National Security vs. Open AI: The incident underscores the growing geopolitical competition in AI, where national security concerns could increasingly dictate the availability and accessibility of advanced AI models, potentially pushing the debate towards open-source alternatives or sovereign AI efforts in other countries.
Redefining "Safety": The disagreement between Anthropic and the government over what constitutes a significant safety risk from a "narrow potential jailbreak" will likely fuel further discussions on how to define and measure AI safety effectively.

Anthropic, which recently raised $65 billion in Series H funding, valuing the company at $965 billion, is a major player in the AI landscape. This incident comes at a critical time as the company is reportedly planning an IPO. The disruption to its latest models could impact its market positioning and investor confidence, especially if access is not restored quickly.

What Comes Next?

Anthropic stated it is complying with the directive while working to restore access to Fable 5 and Mythos 5 as soon as possible. The company plans to share more details and engage in dialogue with the government to address the perceived "misunderstanding."

This situation sets a precedent for how governments might interact with powerful AI developers in the future. It forces a critical examination of the balance between fostering innovation, ensuring national security, and establishing clear, transparent regulatory frameworks for AI systems that are becoming increasingly capable and potentially impactful on society.

Frequently Asked Questions

What specific Anthropic AI models were affected by the government directive?

The U.S. government's export control directive specifically targeted Anthropic's two most advanced AI models: Claude Fable 5 and Claude Mythos 5.

What was the government's reason for pulling the plug on Anthropic's AI models?

The government cited national security concerns related to a "narrow potential jailbreak" that was reportedly discovered in Fable 5. This jailbreak allegedly allowed the model to identify software vulnerabilities.

How did Anthropic respond to the government's order?

Anthropic complied with the directive by disabling Fable 5 and Mythos 5 for all users worldwide, as it could not selectively restrict access to foreign nationals in real time. However, the company publicly disagreed with the rationale, calling it a "misunderstanding" and arguing that the alleged jailbreak was minor and similar capabilities exist in other publicly available models.

What is "Constitutional AI" and how does it relate to Anthropic's safety approach?

Constitutional AI is Anthropic's method for training AI systems to be helpful, harmless, and honest by guiding them with a set of explicit, written principles or a "constitution." This approach aims to align AI behavior with human values and reduce the need for extensive human supervision in identifying harmful outputs.