On Friday evening, artificial intelligence research company Anthropic abruptly disabled its newly launched flagship models, Fable 5 and Mythos 5. This drastic measure followed the U.S. government’s notification of a specific jailbreak vulnerability discovered in Fable 5, which subsequently triggered an export control order. The order’s broad scope, encompassing all foreign nationals, including those present within the United States, left Anthropic with no alternative but to suspend access to these advanced models for all users.
The precise nature of the Fable 5 jailbreak remains undisclosed, adding a layer of opacity to the situation. Anthropic, in its initial statements, characterized the vulnerabilities presented by the government as "minor" and "relatively simple," asserting that they did not exceed the capabilities of other publicly available AI models. This stance appears to be in tension with the government’s decision to invoke export controls, a significant regulatory action.
Prior to the launch, Anthropic had emphasized the rigorous security testing undertaken for Fable 5. The company reported that the model had undergone extensive red-teaming exercises, involving collaboration with the UK’s AI Security Institute and other external security experts. Internal testing by Anthropic indicated that Fable 5 would successfully complete approximately 5% of adversarial cyber tasks, suggesting a baseline level of resilience against such attempts. The Fable 5 model card itself explicitly stated a commitment to "move quickly to update our defenses to ensure that they remain robust to all known attacks" in the event of a public universal jailbreak. However, current information suggests the issue may not stem from a universal exploit but rather a highly specific flaw. As of Saturday morning, Anthropic had not issued further updates beyond its initial statement, which framed the situation as a "misunderstanding."
Divergent Narratives Emerge: The Government’s Perspective
The narrative surrounding the abrupt shutdown gained significant complexity following statements made on Saturday by David Sacks, co-chair of the President’s Council of Advisors on Science and Technology and former White House AI and crypto czar. Sacks took to social media to present what he described as the U.S. government’s account of the events.
According to Sacks, the jailbreak was reported by a "highly credible trusted partner of both Anthropic and [the U.S. government]." He further alleged that the administration had approached Anthropic CEO Dario Amodei, requesting either an improvement of the model’s guardrails to rectify the vulnerability or its complete removal. Sacks’s account claims that "Dario refused," a point of contention that reportedly escalated the situation. This assertion directly challenges Anthropic’s framing of the incident as a mere misunderstanding.
Amazon’s Pivotal Role in Triggering the Shutdown
Independent reporting from The Wall Street Journal and The Information shed further light on the events, identifying Amazon CEO Andy Jassy as the individual who initially reported the jailbreak. According to The Wall Street Journal, Jassy brought the vulnerability to the attention of U.S. officials, including Treasury Secretary Scott Bessent.
The reports indicate that Amazon researchers discovered methods to leverage Fable 5, specifically the version of Mythos 5 equipped with security guardrails, to facilitate cyberattacks. This finding directly contradicts Anthropic’s claims about the model’s safety features. When Anthropic launched Fable 5, it explicitly stated that guardrails were in place to prevent the model from assisting users in initiating cyberattacks or creating bioweapons, among other malicious activities. The revelation that these guardrails could be circumvented raises significant questions about the efficacy of Anthropic’s security protocols.
Users had previously expressed frustration with Fable 5’s perceived over-cautiousness, with many complaining that the model refused to answer innocuous questions. In such instances, the system often reverted to the previous flagship model, Opus 4.8, suggesting a built-in mechanism to manage potentially problematic outputs.
Given that Amazon researchers were reportedly the ones to discover and report the jailbreak, it is highly probable that their testing was conducted on Amazon Bedrock, Amazon’s cloud-based AI service. Amazon has asserted that Bedrock implements the same safety mechanisms as direct access to Claude through Anthropic. This suggests a potential systemic issue in the implementation or robustness of the safety features across different deployment platforms.
Sacks’s critique of Anthropic’s response was particularly pointed. He argued that the company’s defense, which centered on downplaying the severity of the jailbreak, was inconsistent with its self-proclaimed brand as an AI safety leader. "That is not what the trusted partner and the USG believe; nor is that kind of minimizing language consistent with Anthropic’s brand as the AI safety company," Sacks stated. He questioned how Anthropic could dismiss a jailbreak enabling the "operability of a cyber weapon" as not "serious."
This situation presents a significant reputational challenge for Anthropic. The company had positioned itself as a frontrunner in AI safety, notably advocating for stricter regulations and ethical development. The decision to potentially prioritize the continued availability of a consumer model over addressing a reported security vulnerability, especially one that could facilitate cyberattacks, appears to contradict this established brand identity. Sacks amplified this point, stating, "In the past, Anthropic has always said that safety must be top priority and taken super seriously. In this case, Anthropic prioritized the continued offering of the consumer model over safety."
The Path Forward: Technical Fixes and Regulatory Precedents
The most immediate and apparent solution involves Anthropic developing and deploying updated guardrails to neutralize the specific jailbreak vulnerability. However, the inherently non-deterministic nature of advanced AI models means that the emergence of new, unforeseen jailbreaks remains a persistent concern. The company is likely working to implement a robust patch.
Industry observers anticipate that a resolution to this specific issue will likely be found relatively quickly, leading to the lifting of the export control order and the subsequent reintroduction of Fable 5 and Mythos 5 to the public.
More significantly, this incident establishes a new precedent for how the U.S. government might intervene in the development and deployment of advanced AI models. The actions taken by the government are being closely monitored by other U.S.-based frontier AI labs, such as OpenAI and Google, which are engaged in a continuous cycle of innovation and competition. The Fable 5/Mythos 5 situation is unlikely to be the final word on AI model development, and its implications for future releases are substantial.
The U.S. government has previously proposed voluntary safety testing for new AI models before their public release. This recent affair is likely to bring such proposals to the forefront of regulatory discussions. Anthropic itself has been a vocal advocate for AI regulation, making this situation particularly noteworthy given their established stance on the necessity of governmental oversight and safety protocols.
Political Undercurrents and Future of AI Governance
The contentious relationship between Anthropic and the current White House administration adds another layer of complexity to the situation. However, Sacks sought to downplay any partisan influence, stating that "the Admin values Anthropic’s technical capabilities and feels that this issue, while serious, should be easily resolved. The ball is in Anthropic’s court." This suggests that the government views the situation as a technical challenge that Anthropic is capable of resolving, rather than a political standoff.
The implications of this incident extend beyond Anthropic and the U.S. government. It underscores the growing tension between the rapid advancement of AI capabilities and the imperative for robust safety measures. As AI models become more powerful and integrated into critical infrastructure, the mechanisms for ensuring their responsible development and deployment will become increasingly crucial. The events surrounding Fable 5 and Mythos 5 may well serve as a critical juncture in shaping the future landscape of AI governance, setting a precedent for how governments and AI developers will navigate the complex challenges of ensuring safety, security, and ethical deployment in the years to come. The industry is watching closely to see how Anthropic responds and what regulatory frameworks will emerge from this high-profile incident.
