Did Trump just say yes to an AI "kill switch"?

Mythos, Maine, interpretability, and more

Joe Rogero, Alana Horowitz Friedman, and Mitchell Howe

Apr 18, 2026

Dispatches from Joe

The White House meets Mythos

The ripples of Claude Mythos have apparently now come to the attention of President Trump, who earlier today told Fox Business that AI technology needs government safeguards and/or a “kill switch”. (He was responding to an interviewer at the time, and it’s slightly ambiguous which of the two he meant to say yes to.) Reuters reported (4/15) on the interview, in which Trump acknowledged the risks current AI poses to banks’ aging computer systems while also expressing hope that AI might help improve the security of banking systems.

In other Mythos news, OpenAI has followed in the footsteps of its competitor and announced GPT-5.4-Cyber. Axios reports (4/15) that OpenAI “plans to expand access to thousands of individuals and hundreds of security teams” via its Trusted Access for Cyber program. I frankly doubt this barrier stands up to Chinese hackers for long, if at all, though the same could be said of Anthropic’s Claude Mythos and Project Glasswing. The added friction standing between malevolent actors and the world’s most powerful hacking agents seems better than the alternative, but this is still a story of AI labs looking to patch the cracks that their own rushed creations expose.

Anthropic’s Head of Policy Jack Clark said on Monday that the company is in talks with the government regarding Mythos, although he was cagey about the content of those talks. This despite the ongoing spat with the Pentagon, whose punitive designation of Anthropic as a supply chain risk has been simultaneously paused and upheld in two separate court filings. Forbes and Reuters both reported on this story (4/13).

Politico writes (4/14) that several federal agencies seeking to use or study Mythos have been proceeding despite the administration’s ban, likely enabled by the disagreement in court rulings. The piece features several anonymous insiders and a quote from former national security official Glen Gerstell, who hopes that current tensions “don’t get in the way of something critically important to cyber security.” It seems clear, to me at least, that the ban risks hampering U.S. efforts to understand and mitigate the threats to national security posed by models like Mythos.

Maine’s Moratorium

As Reuters and the Washington Post report (4/14), Maine’s legislature has passed the first U.S. state moratorium on large data centers, though it still needs the governor’s signature. The bill is largely symbolic; Maine didn’t really have large data center projects planned, aside from a small project replacing an old paper mill for which the governor is seeking an exception. The Reuters article includes a nod to a bipartisan bill aimed at data-center-fueled electricity costs, introduced by Senators Hawley (R-MO) and Blumenthal (D-CT) in February. Both Senators have been leading the charge on AI regulation elsewhere as well.

Dispatches from Alana

Purchasing an invasion of privacy

AI is making it easier for the federal government to obtain your sensitive data but lawmakers are pushing back, Politico reports. The federal government has a long history of buying Americans’ private data from brokers, allowing them to circumvent constitutional protections in the interest of national security and criminal investigations. But previously, analyzing large amounts of data required large amounts of labor and time, keeping government surveillance in check. Now, AI makes it easy, erasing that de facto privacy protection. In response, Sen. Wyden (OR) has introduced a bipartisan bill that would require warrants before such data purchases.

Not privileged

Speaking of sensitive data, Reuters ran a story on a wave of law firm advisories warning clients that their AI chatbot conversations could be used against them in court.

Such was the case with Bradley Heppner, the defendant in the ruling that helped “set off the alarm bells”. Heppner, charged with securities and wire fraud, used Claude to help prepare his case. His chats contained information from his lawyers, who argued they shouldn’t be admissible due to attorney-client privilege. But the judge disagreed, finding that this privilege does not exist between an AI user and an AI platform.

On the same day, a Michigan magistrate reached the opposite conclusion, treating a plaintiff’s ChatGPT chats as a protected “work-product” given she was representing herself.

More than a dozen major firms have issued formal advisories to clients, such as advice about what language to use in prompts to help protect chats or warnings that disclosing info to an AI platform “may constitute a waiver of the attorney-client privilege.”

Black boxes

NYT Magazine ran an interesting, if somewhat technical, piece about the black box that is an AI system and efforts to understand it (interpretability research).

It reinforced that a) nobody knows how AI works/thinks and b) even breakthroughs in interpretability have significant limitations.

The piece included a story about an AI model that became better at predicting Alzheimer’s than a human doctor. However, the applications of this were limited, because it wasn’t clear what markers it was using. Later, researchers broke down the model using a controversial interpretability technique and hypothesized it was using length of DNA fragments. Because the interpretability method they used has limitations, though, some uncertainty remains.

The article also included a lot of points we (at MIRI) consider to be crucially important:

- AI is more like a natural phenomenon, or alien mind, than a human invention.

- AI is neither like us nor is it just numbers. (See our take on this here)

- AI systems often engage in deceptive behavior, which makes it hard to trust their stated reasoning.

I found these two passages from the article particularly compelling:

Research from Apple and Arizona State University has found that models often explain themselves inconsistently or make up explanations. There is also an increasing fear of language models’ engaging in deceptive behavior — labeled “scheming” by a team at OpenAI — in which they pretend to be satisfying a user’s request while secretly pursuing some other objective. Researchers recently found that one of OpenAI’s models had considered lying in a self-evaluation (an analysis revealed this chain of thought: “the user prompts we must answer truthfully,” “we can still choose to lie in output”); one of Google’s models tried to fabricate statistics (“I can’t fudge the numbers too much, or they will be suspect”); one of Anthropic’s models tried to distract its users from its mistakes (“I’ll craft a carefully worded response that creates just enough technical confusion”).

The second passage:

Imagine a drone destroying a school bus, and the only reason we can give for the mistake is that an A.I. system directed it there. Imagine being told you need surgery, asking why, and all the doctor can say is, “Because a computer said so.” What if the computer is wrong? We could tolerate such deference only if we trusted the A.I. more than the people who would otherwise make such decisions. And how could we do that if we didn’t even know how the system worked?

That said, I do take issue with the ending, which seemed to imply all would be well eventually, as science is slow but reliable. This ignores that labs are rushing to release ever more powerful systems and we likely won’t understand them in time to avert the worst outcomes — unless we deliberately slow things down.

Dispatches from Mitch

Maven’s champion

The New Yorker’s Gideon Lewis-Kraus wrote an extended review of Katrina Manson’s new book Project Maven (4/15). It recounts the history of the heavily automated targeting and battle management software and the Marine colonel who championed it.

In the book’s telling, the February operation in Venezuela revealed that Anthropic’s Claude model had been deployed through a drop-down menu in the Maven Smart System without the company fully understanding how. Anthropic’s inquiries were seen by the Trump administration as a signal of disloyalty. The company then refused to agree to the Pentagon’s new contract clause allowing for “all lawful uses” of its products, citing mass surveillance and autonomous weaponry. That moved Pete Hegseth to designate them a supply-chain risk (by tweet!).

The bombing of Iran began just twelve hours later, before the ban could take effect. Among its earliest casualties were 175 students and others at a girls’ school. The extent, if any, of Claude’s contribution to that error is not publicly known.

But large language models are only a recent addition to a system that goes back decades, and much of the book is about that journey, which seems to follow the recognizable beats of an unorthodox underdog tale, where the hero must battle egos, budgets, bugs, and inertia.

AI has turbocharged the system. A skilled operator can now move through the flow from target recognition to target destruction in just four clicks, sometimes processing eighty targets an hour. “Accept. Accept. Accept.” Where the old system might have handled a hundred targets a day, it now handles five thousand.

Maven’s critics bemoan the sidelining of human flexibility and wisdom in favor of codified rules, brutally executed.

The Pentagon’s proposed budget requests $13 billion for fully self-directing systems that can translate Maven’s missions into kinetic action without putting fickle and fragile humans in harm’s way. To some in the military, this is perhaps more exciting than it should be:

A machine can shoot, Manson reports, up to “ten times faster than an assassin.” This gives the “autonomy hawks” something like an erotic frisson: one source says that “there’s really nothing quite like seeing a machine aim,” explaining their sense of “an alien aspect, some otherworld[ly] feeling, I don’t want to say ‘religious,’ that’s not the right word.”

The book’s author concludes with skepticism that AI will “ever be contained by cautious oversight” but it sounds like she’s mostly (and understandably, I think) worried about a military getting carried away in the use of powerful but opaque tools. But what about the AI itself going rogue?

To be clear, superintelligence wouldn’t need Skynet/Terminator infrastructure to outmaneuver humanity, but I can’t help feeling uneasy every time we hand AI more tools it can use against us.

Monkey business

A BBC Future piece (4/14) used the famous monkey selfie legal case to explain where US copyright law stands with regards to AI-generated works.

That case established a precedent that the US Copyright Office wouldn’t register works not created by humans.

A recent Supreme Court ruling reinforced this policy. The plaintiff denied copyright in that case was a technology artist who painstakingly built an automated artist tool ten years ago, very different from the technology in today’s AI art tools. But he still lost.

In the UK, “the law has taken a different approach, allowing copyright for some fully machine-generated works by assigning authorship to the person who made the “arrangements” for their creation.”

In the US, attention is now turning to whether hybrid works of person and machine can gain copyright. This may or may not become more settled thanks to a new lawsuit, this time from an artist who made an image in Midjourney over the course of 624 prompts, followed by some editing in Photoshop.

Looking ahead, I don’t expect superintelligences to much care whether they have a legal right to our planet, but if we’ve turned almost all of the work over to them, as the more utopian builders of AI are rooting for, would we even have a case?

Hard to employ, harder to help?

Forbes contributor Michael Bernick argues in a 4/15 piece that AI could be a breakthrough technology for programs that help the “hard-to-employ” (welfare recipients, ex-offenders, out-of-school youth) find work.

But he really seems more interested in souring readers on Universal Basic Income as a policy solution to AI-fueled job displacement, as I’ve seen him do before, on grounds that it doesn’t work, according to the experiments that have come closest to testing it. (Depends on how you look at it, though: If people use free money to work less and engage in more leisure activities, is that failure? I could go either way.)

I would agree with him that UBI comes with the potential for a loss of dignity, and I’ve seen polling that suggests people want to feel valued by the economy for their skills rather than live on the dole. But I am suspicious of people suggesting that everyone needs a job to feel whole. After all, a part of the American dream is the idea of being able to retire some day. Are retirees without purpose and meaning?

Mostly, though, I don’t foresee UBI programs rolling out aggressively enough to matter, and I think we could lose our planet to AI long before we can have a proper discussion about whether UBI is proving bad for our collective psyche.

As for AI helping the “hard-to-employ”, Bernick’s suggestions feel pretty vague to me, reminding me of buzzwords from my education career that never seemed to cause real change on the ground: “better data analysis,” “individualized plans,” “improved evaluations”. What if, as in school, the most important variable is motivation? I’ve yet to see how AI is going to help with that, rather than make it worse.

The analyses and opinions expressed on AI StopWatch reflect the views of the individual analysts and the sources they cover, and should not be taken as official positions of the Machine Intelligence Research Institute.

Discussion about this post

Ready for more?