Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

Anthropic, a leading AI safety and research company, has recently rolled out its latest large language model (LLM), Claude Fable 5. While the model boasts impressive capabilities in areas like software engineering, knowledge work, and vision, its public release has been met with significant dissatisfaction from cybersecurity researchers. The core of their concern revolves around Fable 5's notably strict guardrails, which they claim severely hinder legitimate cybersecurity tasks.

The announcement of Claude Fable 5 marks a significant moment for Anthropic, a company founded in 2021 by former OpenAI researchers Dario and Daniela Amodei. Anthropic has consistently positioned itself as an organization deeply committed to AI safety and alignment, prioritizing the development of AI systems that are interpretable, reliable, and aligned with human values. This mission is largely driven by their unique approach known as "Constitutional AI," a framework that guides AI behavior using a predefined set of ethical principles, reducing reliance on extensive human feedback during training.

Introducing Claude Fable 5: A Glimpse into Advanced AI Capabilities

Claude Fable 5, which became generally available on June 8, 2026, across platforms like Claude.ai, Claude Code, and the desktop app, and on Amazon Bedrock and the Claude Platform on AWS on June 9, 2026, is described by Anthropic as a "Mythos-class" model. This designation points to its advanced capabilities, placing it at the forefront of current AI models. The model is specifically engineered for ambitious, long-running tasks, demonstrating exceptional performance in complex software engineering projects, intricate knowledge work, and sophisticated vision-based applications.

One of Fable 5's standout features is its ability for long-running, asynchronous execution. This means it can tackle complex assignments that previous models struggled to sustain, working for extended periods without constant human intervention in areas like coding and knowledge tasks. Furthermore, its advanced vision capabilities allow it to understand and interpret diagrams, charts, and tables embedded within various file formats, including PDFs. This capability is particularly valuable for document-heavy industries such as finance, legal, analytics, and architecture. For coders, Fable 5 can implement designs with high fidelity and even use its vision to critique its own output against initial goals, showcasing a degree of proactive self-verification.

Anthropic has made Claude Fable 5 accessible for organizations through consumption-based Enterprise plans and for developers via the Claude Platform natively, as well as through cloud marketplaces like Amazon Web Services, Google Cloud, and Microsoft Foundry. The pricing for Fable 5 stands at $10 per million input tokens and $50 per million output tokens, which is notably higher than its predecessors, approximately twice the cost of Opus 4.8 and over three times that of Sonnet 4.6.

The Guardrail Controversy: Cybersecurity Community Voices Concern

Despite Fable 5's impressive technical prowess, its public release has been overshadowed by a significant point of contention: its stringent safety guardrails. Anthropic has openly stated that Fable 5 includes robust safeguards designed to limit its performance in specific areas where the risk of misuse is elevated. These areas primarily include cybersecurity, biology, and chemistry. When a user's prompt is flagged by these safeguards as related to these sensitive domains, the request is automatically rerouted to an older model, Claude Opus 4.8, and users are not charged Fable 5 prices for these rerouted requests.

This implementation has, however, drawn widespread complaints from cybersecurity researchers. Many are expressing their dissatisfaction, stating that Fable 5's guardrails are "too strict for any cybersecurity work." Reports indicate that the model effectively blocks anything related to cybersecurity, extending even to seemingly innocuous tasks. Researchers have highlighted instances where Fable 5 refused to engage with code reviews or provide assistance with writing secure code.

Joseph Delong, CEO of Colossus Pay, reported that Fable 5 "outright refuses to do a smart contract audit" and "won't even look at my repo." Similarly, Yearn developer Banteg noted that the model's safety measures prevented all security-related prompts from functioning, remarking, "It doesn't matter if it's smart if 100% of your queries go straight into a trash bin." Another user, wallet recovery tool founder Zeng Jiajun, shared experiences of Fable 5 frequently blocking requests, citing usage policy violations, and describing the AI model as "Too sensitive for even an Ethereum app development."

The core of the frustration among cybersecurity professionals is that the guardrails appear to be overly broad. As one researcher put it, Fable 5 "rejects any request that could be tangentially cyber related. Even innocuous tasks like reading a blog post." This suggests that the model struggles to differentiate between benign, defensive cybersecurity practices and potentially malicious uses. Hayden Adams, creator of Uniswap, highlighted this challenge, stating, "Requests that help harden systems are likely indistinguishable from blackhat requests."

Anthropic's Stance and the Broader Context of AI Safety

Anthropic's decision to implement such strict guardrails stems directly from its foundational commitment to AI safety. The company made an "unusual decision" to restrict access to a more powerful, less-guarded version of the model, known as Claude Mythos 5, to a select group of vetted partners through an initiative called "Project Glasswing." This restricted access was due to concerns that the unconstrained model could identify and exploit vulnerabilities in major operating systems and web browsers.

Dianne Penn, head of project management for Anthropic's research and labs, explained the rationale behind Fable 5's release, stating, "We wanted to make sure for non-cyber use cases, we really prioritized safely releasing Fable as soon as possible." She added that the company is continuing to work on general cyber use cases for the more capable model. Anthropic's concern also extends to preventing the potential for misuse in biology and chemistry, specifically the creation of bioweapons and viruses. The company explicitly stated that their priority was to "safely release Fable as soon as we could, even at the cost of overly broad safeguards."

It's important to note that these guardrails are not permanently "baked into" Fable 5's training data. Instead, they operate as three distinct classifiers that run during inference time, redirecting problematic queries to Claude Opus 4.8. Anthropic even conducted an external bug bounty program, involving over 1,000 hours of testing, to ensure the robustness of Fable 5's safeguards against jailbreaking attempts, with no universal jailbreaks found.

Industry Implications and the Path Forward

The situation with Fable 5 highlights a critical tension in the rapidly evolving field of AI: the balance between powerful capabilities and robust safety measures. While Anthropic's dedication to developing "helpful, honest, and harmless" AI is commendable and crucial for the long-term responsible advancement of the technology, the current implementation of Fable 5's guardrails poses a practical challenge for legitimate security professionals.

Cybersecurity research often involves tasks that, on the surface, might appear to be "harmful" to an AI model trained with strict safety principles. Activities like vulnerability analysis, penetration testing, and secure code development require the ability to understand, generate, and manipulate potentially risky code or exploit scenarios. If an AI model cannot engage with these concepts, even in a controlled and ethical context, its utility to the cybersecurity community is severely limited. As crypto security expert Taylor Monahan observed, Fable 5 "changes nothing for your average security person" if it can't handle security-related prompts.

The complaints from researchers underscore the need for more nuanced and context-aware AI safety mechanisms. While preventing malicious use is paramount, hindering defensive security work could inadvertently weaken overall digital defenses in the long run. The industry will be closely watching how Anthropic addresses these concerns, potentially through more sophisticated classifiers that can distinguish between malicious intent and legitimate security research, or by expanding access to less-guarded models like Mythos 5 to a wider, vetted community of cybersecurity experts. Anthropic's commitment to continuing work on general cyber use cases suggests they are aware of this challenge and are actively seeking solutions.

For now, Claude Fable 5 stands as a testament to Anthropic's commitment to safety, but also as a focal point for a critical discussion about how AI guardrails can be implemented effectively without stifling essential defensive research and innovation in cybersecurity.

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

Introducing Claude Fable 5: A Glimpse into Advanced AI Capabilities

The Guardrail Controversy: Cybersecurity Community Voices Concern

Anthropic's Stance and the Broader Context of AI Safety

Industry Implications and the Path Forward

You Might Also Like

A Marc Benioff-backed startup thinks AI can solve the AI deployment problem

Sam Altman and AI’s decel debate

Google nixes its Earth AI feature one day after launch, amid criticism it would spread misinformation