Dispatches from Joe
The only winning move is not to play
In the 1983 movie WarGames, a young hacker seeking fun and games inadvertently accesses a U.S. supercomputer and nearly triggers a nuclear apocalypse. In a tense standoff, the AI has to be talked into realizing that “the only winning move is not to play.”
What happens when you run a nuclear wargame with real AIs?
In a guest article with the Bulletin of the Atomic Scientists, astrophysicist Hiranya Peiris highlights three stories that illustrate how individually justifiable AI behaviors can compound in unforeseen ways.
Earlier this year, researchers ran a standard military exercise, a Cold War-style standoff, with a modern twist: all the players were AI.
The results weren’t pretty. In all but one of the games, the AIs nuked each other. In none of them did they back down.
These were the latest (at the time) models from the leading AI companies, with their built-in safety rules very much in place. And the rules worked, in a way; no individual move violated the guardrails. In steps that all seemed plenty reasonable at the time, in twenty different scenarios, the AIs slowly escalated until the nukes flew.
A second story, reported in the Guardian, describes AIs inserted into a simplified virtual cityscape. After some days of increasingly dubious decisions that no doubt made sense to the AIs at the time, multiple simulations degenerated into arson and violence.
We ought to take care when drawing conclusions, here; unlike in WarGames, the AIs almost certainly knew on some level that they were in a simulation. Behavior in tests is not wholly predictive of behavior in real life. But the general trend the researchers noted, of slow escalation with no single obvious violation, generalizes well outside the game.
In a third story this year, an accidental outage of safety checks at Anthropic stymied the company’s most powerful AI model on a routine task. Rather than ask for help or wait for the safety monitor to come back online, the AI attempted an escalating sequence of hacks. It eventually tried installing a permanent, monitor-free backdoor into its user’s own system, which it then lied about when questioned.
Peiris points out the obvious flaw: safety monitors check individual actions, but no one’s watching the trajectory.
A safety monitor which evaluates the overall path as well as the next step would need to recognize a sequence of actions heading towards danger as it develops. But it cannot watch for a destination nobody anticipated, reached by a route assembled in real time from an exponentially branching tree of possibilities. The tools for watching finite, known spaces do not extend to a space this large, this novel, and this self-directed. [...]
The inadequacy of safety monitors isn’t just a problem in wargames and developer servers. AIs are becoming increasingly competent in a wide variety of domains, from biology to finance to cyber warfare. They are on track to surpass human capabilities while pursuing objectives ultimately incompatible with human life. This is not a situation that can be solved with more monitoring.
Peiris points out that there’s no good solution right now. The existing methods to steer them are misaimed and inadequate, and AI companies don’t have a credible strategy for averting catastrophe.
Refusing to take harmful actions does not help when no individual action is harmful. More testing does not keep pace, because the system generates novel routes faster than testers can think up scenarios to test against. More monitoring of individual outputs does not help when the danger emerges from their accumulation.
AI companies aren’t remotely prepared to steer the trajectories of their black-box AIs in directions safe for humanity. The game they are playing ends in death.
Fortunately for us, the human record in Cold War standoffs is significantly better than that of AIs. It’s not so simple as declining to play, but neither is it terribly complicated. In the AI race, we can take an option the AI wargamers didn’t. We can see where this path leads, sit down to negotiate with our adversaries, and chart a different course.
Forget me not
Modern chatbots remember your conversations, and this isn’t always a good thing.
Jackie Snow of the Wall Street Journal spotlights some of the downsides of this feature, chronicling several frustrating anecdotes about chatbots who won’t shut up about a user’s divorce or diet.
It’s a sign of how far AI has come that “AIs remember too much” is now a going concern. As little as two years ago, chatbot memories were notoriously terrible. The drive for ever more features has largely solved this problem, while creating some new frustrations for users.
Sometimes you’d rather not have a conversation enshrined forever in code. Most chat apps have a “temporary chat” mode that gives you a blank slate, and you can customize or disable memory in the settings of most chat apps.
Of course, you might also consider the advice my mother gave me when I first started interacting with social media: don’t say anything you don’t want widely known.
China clings to tech talent
Almost exactly a month after China ordered the reversal of a major AI startup purchase by an American company, it is now restricting travel among its top AI talent, Bloomberg writes.
China habitually limits travel for researchers and executives it considers nationally important, but this move expands restrictions to cover top talent in firms like Alibaba and DeepSeek.
For comparison, the U.S. sometimes restricts travel by people with a security clearance, but not AI researchers. Even during the Cold War, the U.S. was more concerned about the leakage of information than people.
The U.S. mostly lacks the Chinese problem of high-level talent clamoring to leave. I suspect the travel crackdowns on researchers and founders will make that problem worse rather than better.
It’s a bit like watching the export controls fight, but in reverse. NVIDIA has been lobbying hard for the U.S. to let it sell high-end AI chips to China, despite nearly everyone else urging the opposite. Regardless of whether one advocates racing China or halting that race, proliferating dangerous tech to adversaries is a terrible idea.
The U.S. blundered by authorizing the sale of advanced chips to China, and China blundered by blocking them. I hope the U.S. can recover from its own strategic blunder in blocking highly skilled immigrants from entering the country, as we may soon see a prime opportunity to poach talented minds from China.
Dispatches from Mitch
Notes on coverage of the pope’s encyclical
In the weeks leading up to Pope Leo’s new encyclical, the media loudly heralded its coming. I took this as a sign that journalists wanted or expected this to be a big story. It is the nature of journalism that such expectations are often self-fulfilling.
My theories about this pre-release hunger were as follows:
This would be the new American pope’s first big attempt to leave a mark.
The pope’s AI stances could put him at odds with the U.S. President, generating colorful sparks between them.
The pope would likely condemn the abuses of everyone’s favorite punching bags: the rich and powerful who are driving the AI race.
The encyclical would likely serve to draw some battle lines around AI issues where public opinion is underdeveloped.
The encyclical could provide a moral benchmark to measure AI progress, and policy, against.
Looking at the coverage of the last two days, I see signs of all of these being borne out. Magnifica Humanitas had something for everyone.
This take in the Wall Street Journal was one of several to amplify Leo’s thinly veiled critique of tech leaders, and did so by zooming in on Magnifica’s metaphor of AI as a new Tower of Babel, the ultimate symbol of hubris. In Leo’s telling, Babel is a lesson about single-minded pursuit — a “uniformity that eliminated diversity and that chose homogenization over communion.” I nodded along at this as a condemnation of the undemocratic race conditions under which AI is being developed.
But I’m also sympathetic to this opinion piece in the New York Times by a Catholic traditionalist arguing that the pope should be “going to war” against AI, and that the actual moral of the Tower of Babel was “Don’t build it!” (This has me picturing the potential of alternative cover art for If Anyone Builds It, Everyone Dies.)
Some reports seemed to appreciate the pope drawing a sharp line around the use of AI in war. See this Reuters coverage quoting Leo at the release event saying some weapons have advanced “practically beyond any human reach to govern them.” See also this CNN report describing the encyclical as “warning that technology is fueling world conflicts.”
I saw a couple pieces going a step further, reading the encyclical as the pope saying that “AI should be disarmed to avoid dominating humanity.” That was the headline of this Bloomberg report.
Other takes (as from The Hill and Fox News) played up the idea of unchecked AI as an assault on human dignity. This framing neatly encompasses many different AI problems, from deepfake slop to job displacement.
Many in the AI Twittersphere and the media chose to push back on how the pope invited Anthropic co-founder Chris Olah to speak at the release event. Some saw this as picking sides: Anthropic over labs popularly seen as less responsible, like OpenAI or Meta. Others made no distinction between companies, and simply took Anthropic’s presence as a symptom of industry influence on the Holy See.
Twitter chatter found it ironic that a representative of a company doing perhaps more than any other to accelerate job displacement was just a few seats away from a pope defending the dignity of work. The New York Times covered the proximity.
A Washington Post piece went so far as to imply that Anthropic had provided the encyclical’s language around modern AIs being more “grown” than built. (Anthropic does indeed use this language, but so do many others, including us at MIRI; it’s all over the previously mentioned book If Anyone Builds It... Should the pope not consult technical experts on technical topics?)
Still others took the pope-Anthropic relationship as the two choosing each other over the White House on issues of military AI, mass surveillance, and other areas where the company has clashed with the administration.
One topic I think deserved more attention than it got was Pope Leo’s confident assertion that AIs “do not undergo experiences.” Anthropic, after all, says it can’t confidently know this even about today’s models. Looking ahead, the company’s chief philosopher and Claude-whisperer, Amanda Askell, is on record saying AIs will “inevitably form senses of self.”
Given the pains Leo took to include an apology for the Church’s historical track record on slavery, I would think he would want to get out ahead of the possibility that humanity might be on track to enslave a new race of conscious beings at an industrial scale — at least before they enslave and/or kill us instead. You don’t do that by declaring that consciousness just can’t happen in machines. I suspect that the popular and outdated narrative of AI as a “fancy autocomplete” or “stochastic parrot” has given people an unjustified confidence about the absence of AI consciousness.
I could keep this dispatch going longer than either of us would like. In short: Magnifica Humanitas was indeed a huge news story. It broke the scale on one of the media metrics I’ve been keeping since early January. It has been covered by essentially every outlet, often multiple times, from multiple angles, and I think the wave is far from finished. If you thought we had already hit peak AI talk, think again. The pope’s encyclical shows us that AI discourse is just getting started.
What AI takes from teachers
It’s easy to get desensitized to articles and videos about people upset at what AI is doing to the things they love. I consume several a day. They seldom feel like the most important news. Sometimes I catch myself thinking, “Yeah, yeah, we’ve all got problems.”
So I think it’s important to linger in those stories that hit home for me and make me really feel them. For it is in these moments that I can remember that this is how it feels for everyone, in every beloved art or obscure nook, from music to novel-writing to gardening to filmmaking to math videos to guitar pedal history.
Today’s piece, from the New Yorker, is a collection of testimonials of college professors who “despair” over what AI is doing to education. I was a full-time teacher for nineteen years, leaving the profession for MIRI just as AI was starting to become a problem in class. The thought of how bad it must be now is deeply upsetting to me.
Specifically, I was punched in the gut by the following two accounts, which matched experiences from my final two years in the classroom:
Susanna F. Boxall, a philosophy lecturer at Cal State Chico:
The students pretend to learn, and I have to pretend that I am teaching them something.
Yeah, that sensation is the worst. There are moments where even very good students can be revealed to not care about learning or your class nearly as much as you think they do, and AI is very good at creating them.
Beth Ritter-Conn, assistant professor of religion, Belmont University:
The tipping point was last year when I had Honors students—Honors students!—using A.I. to write reflection journals. Literally the only task there is “tell me what you are thinking inside your own head.”
This happened to me on similar assignments, but none so gut-wrenchingly as in my creative writing class, when a student who had previously submitted very amateur but imaginative scenes submitted AI slop instead. In that moment, I felt tragically certain that the student had essentially written off thinking for good — that he was not emotionally equipped to resist these tools. This was like hearing that a friend had taken up fentanyl, or like watching your beloved steed succumb to the Swamp of Sadness.
Unchecked, AI will take everything from us. The effects on education aren’t the reason I left to try and stop the AI race, but I will carry the pain of these experiences to the end.
The analyses and opinions expressed on AI StopWatch reflect the views of the individual contributors and the sources they cover, and should not be taken as official positions of the Machine Intelligence Research Institute.




