Cybersecurity Researchers Criticize Anthropic's Fable for Overly Strict Guardrails

TL;DR

Anthropic’s new public model, Fable 5, is drawing criticism from cybersecurity researchers who say its safeguards are so strict that they block many legitimate security tasks.
The model reportedly redirects or refuses prompts related to cybersecurity, biology and chemistry, and distillation, even when the request is benign or defensive.
Anthropic says the limits are intentional: they are meant to reduce misuse risk while still making the model broadly available, and it has introduced a separate vetted program for approved security work.

Anthropic’s newly released Fable 5 is facing backlash from cybersecurity professionals who say its guardrails are so broad that they interfere with everyday defensive work. Researchers have described the system as refusing even innocuous, clearly legitimate prompts if they fall anywhere near cybersecurity language.

Why researchers are upset

The core complaint is that Fable 5 appears to use highly restrictive classifiers that can flag ordinary security-related requests as risky. Valentina “Chompie” Palmiotti, a security researcher at IBM X-Force, said the model “rejects any request that could be tangentially cyber related,” including “even innocuous tasks like reading a blog post.”

Other reports say the model can also route prompts about smart contract auditing and vulnerability analysis away from the new model and into an older, less capable one. Protos reported that users trying to check crypto smart contracts for security issues were redirected, and that “distillation” prompts were also being blocked.

What Fable 5 is doing differently

Anthropic launched Fable 5 as a public-facing version of its more capable Mythos-class technology, but with additional restrictions intended to prevent abuse. The company said the system includes safeguards around cybersecurity, biology, and other sensitive domains, and that prompts in these areas may be redirected to Opus 4.8 instead.

Several reports say Anthropic’s intent was to make a powerful model available to general users without exposing the highest-risk capabilities that could help attackers find or exploit vulnerabilities. The company has also said that some requests are blocked because they could be misused to develop malware or compromise software.

The safety-versus-usability trade-off

This controversy highlights a familiar AI dilemma: the stronger the guardrails, the lower the risk of misuse, but also the higher the chance that legitimate users get shut out. In Fable 5’s case, cybersecurity defenders argue that the guardrails may be too blunt to be practical for tasks like vulnerability research, incident response, and secure code review.

Anthropic has acknowledged that the restrictions are broad. Business Insider reported that the company said the model’s safety classifiers can mistakenly flag benign requests, and that it intentionally accepted “overly broad safeguards” to release the model sooner.

Anthropic’s response

Anthropic’s public position is that these limits are necessary because frontier models can now show strong dual-use capabilities in cybersecurity and biology. In its own red-team update, the company said current models are approaching undergraduate-level cybersecurity skill and expert-level knowledge in some biology areas, which reinforces the need for targeted safeguards and oversight.

At the same time, Anthropic has created a separate Cyber Verification Program for vetted security professionals, allowing approved users to access a less restricted version for legitimate offensive and defensive work. Reports also say the company is requiring retention and logging controls for traffic on these models.

What this means for cybersecurity teams

For security teams, Fable 5 may be useful for broad reasoning and general workflow support, but its value appears limited when the task crosses into domains the model classifies as sensitive. That means teams looking for AI assistance with threat hunting, exploit analysis, or secure code auditing may still need either human-led workflows or access to Anthropic’s vetted programs.

The broader takeaway is that AI vendors are still experimenting with how to ship advanced models safely without making them unusable for the very professionals most likely to benefit from them. Fable 5’s reception suggests that, at least for now, Anthropic’s default answer has erred heavily on the side of caution.

AndroGuider Team

Articles written by the AndroGuider team. We try to make them thorough and informational while being easy to read.

Cybersecurity Researchers Criticize Anthropic's Fable for Overly Strict Guardrails

TL;DR

Why researchers are upset

What Fable 5 is doing differently

The safety-versus-usability trade-off

Anthropic’s response

What this means for cybersecurity teams

Recents

YouTube

Comments

Translate

Facebook

Twitter

Cybersecurity Researchers Criticize Anthropic's Fable for Overly Strict Guardrails

TL;DR

Why researchers are upset

What Fable 5 is doing differently

The safety-versus-usability trade-off

Anthropic’s response

What this means for cybersecurity teams

Follow Us

Recents

YouTube

Comments

Translate

Facebook

Twitter