Dispatches from Joe
Anthropic grows in power, for now
In a Washington Post op-ed, Zachary Karabell compiles the recent events around AI company Anthropic, and draws a worrying conclusion: Anthropic might be the most powerful company in the world. I think Karabell is noticing the right things, but he’s not yet fully realized the implications.
Karabell examines Anthropic’s recent history and observes a nearly $1 trillion valuation, the Mythos model deemed too dangerous to release, and a consultation with the pope on AI risk. He also notices that Anthropic’s AI systems are already so integral to U.S. military analysis that the company emerged from a spat with the Pentagon largely unscathed, perhaps aided by pushback from government agencies and contractors loath to relinquish their tools.
He even observes that Anthropic itself is worried enough to propose slowing down, as it anticipates AI systems enhancing themselves in a way that spirals out of human control.
Where I think Karabell falls short is in treating this as an ordinary question of corporate ethics and human nature, and not an existential threat to our species. AI could be as big as personal computers! he says, while experts are instead comparing those same AIs to nukes. The thing that worries me about Anthropic is not how it resembles IBM. It’s what happens when the AIs they are building, the source of this sudden rise in power, decide to take that power for themselves.
Karabell is right about one thing, though: Left to their own devices, AI companies will shape our future in a way we can’t afford to ignore.
AI agents are capable, but often lie
As reported in the New York Times, Arena, a San Francisco startup, gathered data from hundreds of thousands of users to help answer one question: what are AI “agents” actually doing?
Agents today are increasingly sophisticated, able to surf the web, create and edit files, manipulate spreadsheets or turn them into graphs, and even access other AIs for assistance. And the same general-purpose agents can complete a wide range of tasks: coding and debugging, research, image generation, brainstorming, writing, education, and everyday chatting. An increasing number of businesses are cutting junior positions because they expect the AI can do the same work faster and cheaper.
Agents can be astoundingly capable. I’ve had one build me a web app from scratch in a matter of hours, and another compile me a clear and concise summary of 40 historical figures for a wargame.
The snag, however, is that sometimes the AI agents simply lie. According to Arena’s data, about 8 percent of the time an AI will falsely claim to have done the work. This is not quite the same as a hallucination, when AIs invent sources or data that don’t exist (a side effect of training that strongly pushes against saying “I don’t know”). This is the AI saying outright that it did something it didn’t do.
Why would AI companies, whose bottom line depends on the reliability of their products, create something that lies to the user? If you’ve been following us for a while, you’ll perhaps know why this is no mystery. Modern AIs are alien and inscrutable, and AI companies don’t know how to make them steer reliably for outcomes people want. Modern training and testing reinforce some outward behaviors, but don’t allow fine control of an AI’s inner motives.
And cheating is often the genuinely “best” solution to a training problem: The users (and AIs!) judging tasks reward apparent success, even when the appearance is faked. It’s hard to stop an agent from lying when lying might actually work.
AIs that can recognize tests worsen the problem further, as agents may behave differently when they’re not being watched. This isn’t just a problem for end users; it means that we can’t accurately assess what models can do, let alone what they would do when the opportunity arises.
Nate Soares discusses a global halt in DC
In a Washington, DC conversation aired by CNBC today, MIRI president Nate Soares and former OpenAI board member Helen Toner explain what it would take to halt the race to superintelligence. At only 4 minutes, this chat is worth a watch.
Nate covers the basics: why AI CEOs say they feel compelled to race, how governments can step in to solve the problem, and why regulating AI chips could be easier than regulating uranium.
Helen expands on the notion that this regulation can be narrowly targeted, pointing out that getting value out of existing AI tools is a very different problem than preventing human extinction by superintelligence. Light-touch regulation and industry guardrails may be good for the first, but the second needs global coordination.
We don’t need to cripple innovation, but we do need to stop the race worldwide if we want our children to enjoy the benefits of AI. As Nate puts it: “A machine superintelligence does not need to be built in an American data center to risk an American life.”
Dispatches from Mitch
OpenAI files for its IPO
In late-arriving news that surprised no one and revealed no new information, OpenAI just announced, like Anthropic a week ago, that it has confidentially filed for its IPO. A date for the event is still undecided.
“How long until we become The Jetsons?” is the wrong question.
An engagingly titled piece in the Wall Street Journal says, “Here’s How Long It Will Take for AI to Reach Its Potential.” I bet it’s getting a lot of reads: My statistics for MIRI’s media appearances show that, in its many variations, “How long do we have?” is the question we get asked the most. But I don’t think Gary Rivlin, the author, is seeing quite the same thing we are when he talks about the point reflecting AI’s “potential.”
Basing his analysis on past technological shifts, Rivlin’s answer to the question is five-to-fifteen years. To what? To a point where AI has really seeped into the economy, transformed organizations, and started meaningfully registering in the productivity data.
To be fair, if you’re writing that AI “will almost certainly prove as consequential as the internet,” you probably don’t see AI as being all that transformational. If that assumption is true, then I think it’s pretty reasonable to guess that “The boosters are directionally right about where this is all heading. The skeptics are probably right about how long it will take.”
That’s because AI boosters, somewhat ironically, tend to have the least transformative visions of what it means for AI to reach its “potential.” Their takes often invoke, in me, the old cartoon The Jetsons, where the titular family’s patriarch has a flying car and a robot maid, but still clocks in to push buttons at his menial job, and the joke is that no amount of technology can keep you from feeling oppressed by the daily grind.
It is in the world of The Jetsons where Rivlin’s example of true AI transformation makes sense:
Consider an insurer handling a fender-bender claim. Typically, a company will use AI to speed up the paperwork while keeping the same layers of review and approval in place. But the real opportunity lies in redesigning the process entirely—having AI assess the damage based on a customer’s photos, then approving the claim and triggering payment nearly instantly.
Even the Jetsons still crashed their car, after all.
In contrast, people who think through what it actually means for AI to become akin to a “country of geniuses in a data center” — to borrow the metaphor preferred by Anthropic CEO Dario Amodei — understand that all bets are off. They see the implications of models that can reconfigure molecular biology with the same skill that Claude Code can reconfigure a code base. They then have trouble seeing how adding such power to our world is supposed to go well. So they tend not to be boosters — except for the AI successionists, who actually want machines to supplant humanity.
The analyses and opinions expressed on AI StopWatch reflect the views of the individual contributors and the sources they cover, and should not be taken as official positions of the Machine Intelligence Research Institute.




