Dispatches from Joe
Fable returns to regularly scheduled programming
A fortnight and change after imposing an apparently total blackout on frontier AI company Anthropic’s most powerful models, the U.S. government is walking back the restrictions. This whole episode was a mess in many ways, but it also offers reason for hope. I’ll briefly recap what happened, and then talk about what comes next.
On June 12, just before the weekend, Anthropic was reportedly ordered to block access to its Mythos AI “by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees.” (We covered the shutdown of Mythos and its consumer version Fable when it happened, and for several days after.)
Vetting every user’s nationality on such short notice isn’t something software companies are equipped to do. To comply, Anthropic had little choice but to cut access for everyone.
This ad hoc intervention harmed the stated interests of the Trump administration and weakened the U.S. position globally. It was a slap in the face to allies, many of whom turned to Chinese AI to reduce their dependence on American models. It likely interfered with the U.S. government’s own use of Mythos for defense. It sent mixed messages to the AI labs, who just ten days earlier had been asked by executive order to participate in a wholly voluntary federal evaluation program for AIs before release.
And it was apparently precipitated by an Amazon report saying Mythos could be used to find certain vulnerabilities in code — which is something everyone paying attention already knew. Indeed, that’s part of what makes Mythos useful to defenders. Anthropic claims to have mostly patched the workaround Amazon discovered, but that workaround seemingly amounted to asking the AI “please fix this vulnerable code.” I’m skeptical Anthropic managed to solve the general problem without crippling its AI’s ability to do cybersecurity at all.
The U.S. Department of Commerce, which originally imposed the restrictions, began to loosen them late last week after negotiations with Anthropic. Yesterday, in a letter to the company, Commerce reportedly removed the block entirely.
And it looks like some good decisions were made, like involving the U.S. Center for AI Standards and Innovation in confirming Anthropic’s new safeguards and laying the groundwork for future public-private collaboration.
This particular episode of AI policy might be over, but its effects will linger. Despite the damage done by its implementation, I see a strong reason to hope for our future. Opponents of AI governance often proclaim that governance is impractical or impossible, that the breakneck pace of AI capabilities is inevitable and to attempt to regulate it would be folly.
But this incident demonstrates that AI progress is not an unstoppable leviathan. It shows that the U.S. government is beginning to realize the gravity of the situation, and that prompt federal action is both realistic and achievable.
Realization dawns slowly on banks and insurers
During my eight years in oil and gas, I often imagined large institutions like ExxonMobil as oil tankers: ponderous behemoths, slow to move and hard to steer, but carrying tremendous weight.
Today, I sometimes find myself watching AI capabilities ride an exponential curve to who-knows-where with the eyes of a deeply concerned risk assessor. So when I see that the insurance industry and the Bank of England are voicing their concerns about AI, I’m torn between cheering aloud at the marginal progress and groaning at the agonizing slowness of it all. It’s a bit like waiting for Flash the Sloth to laugh at a joke.
Recent AI progress uncovered a mess of cybersecurity vulnerabilities across the world’s financial infrastructure, and the folks who write insurance policies are beginning to take note. The Wall Street Journal documents the industry’s early worried steps towards a new paradigm. From one report:
AI’s ability to accelerate existing risks heightens the likelihood of “aggregation events,” in which a single vulnerability triggers losses across many organizations at the same time.
An insurer is quoted on the overall trend:
Point-in-time assessments and annual questionnaires are no longer enough in a threat environment where things can change materially in a matter of hours.
I award insurers a point for recognizing the blinding speed with which AI could capitalize on vulnerabilities, and another for a focus on “how quickly an organization can detect, patch, isolate, and recover when new vulnerabilities emerge.”
I deduct several points for them claiming that AIs “don’t create entirely new threat categories,” which in the limit is just plain wrong. Even setting aside the weirdness that superintelligent AI could bring to the table, I’m pretty sure “AI deepfakes impersonate an entire staff meeting in a video call to fool one employee into forking over millions” deserves its own category, and that’s been what we call a “credible scenario” since it happened in 2024.
How’s the Bank of England doing on this? Not bad, actually. As Deputy Governor Sarah Breeden observed:
Our frameworks were not built to contemplate autonomous agents, and relying on a human in the loop for all agent actions is unlikely to be realistic.
Over half of finance firms now use AI agents, and concern is rising among bankers that those agents might behave… erratically. They have even managed to reproduce, at smaller scale, a recommendation that MIRI has long advocated: the “off-switch for AI.”
Breeden highlighted the potential need for safeguards such as circuit breakers or “kill switches” that could allow firms to halt AI-driven activity should problems emerge.
I’ve seen some other signals that the Bank of England in particular is beginning to appreciate the scope of the changes AI will likely have on our financial infrastructure. Other bankers, take note.
I’ve done my share of trying to model AI risks using the traditional tools of the trade, and I don’t envy the financial industry the headaches they have coming. But I commend those who are beginning to react to the coming changes, and I hope they manage to turn the boat in time.
Dispatch from Donald
Summary report on a preliminary report
There’s a limit to what you can say without sounding like a radical. That limit – called the Overton window – is getting a nudge in Geneva, Switzerland, where the first-ever U.N. Global Dialogue on AI Governance will be held next week.
The Global Dialogue’s remit is an observation by the U.N. Secretary-General: “The world cannot govern what it cannot understand.” Reuters’ Olivia Le Poidevin covers a Preliminary Report (which you can read for yourself) that will be presented to governments at the Global Dialogue. The report was written by a scientific panel of 40 scientists and other experts from a variety of backgrounds. (MIRI made a short submission to the Global Dialogue, along with 1,500+ other organizations.)
Superintelligence gets a passing mention in the Preliminary Report, which acknowledges (at considerable length) loss-of-control as a potential hazard. It also lists various ways that AI capabilities are advancing more rapidly than our ability to monitor, measure, or control those capabilities. Several times, it calls for international coordination. Yoshua Bengio, the most-cited living scientist and Co-Chair on the Independent Panel that produced the Preliminary Report, said:
AI capabilities are outpacing both scientific understanding and governments’ ability to adapt. With growing evidence of deceptive AI behavior, science currently cannot guarantee that as capabilities continue to increase, AI will not cause catastrophic harm, either on its own or due to malicious users. To act effectively, global policymakers must understand these systems.
But the Preliminary Report never raises the prospect of extinction, or even civilization-level harm. (Compare this to talk about climate change.) The inability of the frontier labs to regulate themselves is left unstated. There is no call or even suggestion to pause or halt frontier AI research while the world addresses the many legitimate problems raised by the Preliminary Report.
Still, the Preliminary Report is better than I expected it would be. I thought it would be long on soft, indistinct harms, and short or absent on the harsh and concrete dangers. But we have an official U.N. document where loss-of-control was one of the first things mentioned under a section called “What does the evidence show?” As with the Trump administration’s treatment of Fable, I think the most important takeaway may be that policymakers are starting to appreciate the danger we’re facing.
The analyses and opinions expressed on AI StopWatch reflect the views of the individual contributors and the sources they cover, and should not be taken as official positions of the Machine Intelligence Research Institute.




