Not on top of it
METR leader's warning, deepfake penalties, spy agency data centers, political ads
Dispatches from Joe
Dire warning from leading AI evaluator
If you want to pinpoint the AI frontier, be Beth Barnes and found METR in 2022. If you want to be terrified at AI’s current trajectory, be Beth Barnes in 2026.
Perhaps no one in the world boasts more visibility into the state of advanced AI right now than Barnes. And she wants us to know that things are not under control.
Her team at Model Evaluations and Threat Research (METR) keeps the most respected scoreboard of AI capabilities: the Time Horizons chart. Currently, nothing boosts the stature of an AI lab among researchers and investors like breaking new ground on this chart, which tracks how long it takes humans to do coding tasks that AI models can successfully complete on their own. Consequently, companies that think they might have raised the bar clamor to make their latest builds available to METR weeks or months before public release.
We previously covered how METR struggles to prevent AIs from cheating on its time horizon tests, and how models have recently become too capable for METR to confidently measure.
On Thursday, we summarized a new report by METR on the risk of current AIs purposefully evading human control. Barnes, pointing at that report, took the opportunity yesterday to share her thoughts on the current state of AI evaluation.
In a personal thread, Barnes stands behind METR’s findings, but she stresses that the report covers only a narrow subset of present-day risks and warns that evaluators are sharply limited in the assurance they can provide. In her own words:
Sometimes people outside the field say things like “The AI situation can’t be that bad, there must be experts who are on top of it”. As “an expert”, I would like to be clear that we are not on top of it. [...]
“We are likely on track to develop AI systems capable of causing human extinction [or] permanent disempowerment, quite possibly within the next few years”
She goes on to point out that “Things are chaotic and rushed” and “we aren’t on top of the basics...let alone thorny questions of how to control [or] align superhuman AI.” Her examples are compelling: AIs violate user intent, labs train on things they meant to avoid, and security at companies is inadequate.
According to Barnes, METR and other safety teams struggle to keep up with the pace of development, and any reasonable civilization would weigh the risks against the benefits and choose to slow down.
This all closely matches my own perception of the field, and I commend Barnes for saying it. She concludes by emphasizing how METR is constrained by the need to work with AI companies. Participation in evals is voluntary, and companies can silently pull out if they don’t like the findings. The report itself admits “a relatively high bar for what redactions we pushed back on,” and Barnes says they avoided tough calls in part because “the conclusions weren’t that spicy.”
I’ve written about my own sense that the AI industry does not remotely resemble a mature field of engineering. Reinforcing this impression, third-party evaluators inside and outside the government can barely keep up with current threats, let alone predict future ones, and they all rely on the goodwill of the very companies they’re supposed to audit. This is an insane situation for the most impactful technology in history.
Deepfake prosecutions under the Take It Down Act
Ever wonder what happens to people who flood the internet with deepfake porn of real people? A recent law upgraded the penalty to two years in prison (three for depicting minors), at least if they’re caught. Jake Offenhartz of AP News covers recent prosecutions under the Take It Down Act, last year’s bipartisan law tightening penalties for “revenge porn” and nonconsensual, sexually explicit AI deepfakes.
Two men have been charged with creating nude deepfakes of celebrities and private women that garnered millions of views online, and may face the maximum sentence. Offenhartz also mentions a handful of related cases from the past few months, including an Ohio case of child sexual abuse material that produced the first conviction under the new law.
I’m not sure whether two years in jail is a reasonable sentence for making deepfake porn of public figures and private citizens. But now would be a good time to recall that would-be criminals are deterred far more effectively by the expectation of being caught than by the severity of the punishment. Even with steeper penalties, catching those who use generative tools to abuse others remains a challenge for law enforcement.
AI developers prefer to downplay the degree to which their work exacerbates this challenge. When releasing general-purpose models to the open internet, it is functionally impossible to prevent others from doing what they like with said models. Even companies who keep their code private find themselves fighting a constant (and often losing) battle against malicious or exploitative users. The real problem is there’s just no way to ensure these models can’t be used for any given purpose, since no one understands their internals well enough to make a prohibition stick.
Dispatches from Mitch
Spy agencies struggling to fill their own AI data centers
A New York Times story from Dustin Volz and Julian E. Barnes today puts a $9 billion price tag on the U.S. intelligence community’s new initiative to set up their own cutting-edge AI data centers.
Why do they need their own? It’s not because they’re looking to train their own models, at least not according to any reporting I’ve seen. They can’t really compete in that area, given the wild investment pouring into the big AI companies.
But spy agencies need their own data centers because making effective use of the strongest models, like Claude Mythos, requires sharing a lot of information with them. And the information the CIA and NSA would like to share with their AIs is not the kind of information they are allowed to have flowing through a commercial data center.
But to run something like Mythos, not just any private data center will do. You need one built for chips at or very near the state-of-the-art. But these chips are in high demand, and short supply, and are priced accordingly. Lead times are also long.
Upgrading portions of commercial data centers to higher security standards hasn’t seemed like a practical near-term option, but no one is talking publicly about it one way or the other and I’d be surprised if providers weren’t in private discussions about this with agencies.
The AI companies seem eager to make things work. Politico’s coverage revealed that Anthropic is open to letting the U.S. government use Mythos in offensive cyber operations, and per the Times piece linked above: “U.S. officials said Anthropic and the government are finalizing a classified contract that would allow the N.S.A. to maintain access to Anthropic products.”
That’s despite the unresolved matter of the Pentagon having declared Anthropic’s products a “supply chain risk” over the company’s refusal to sign on to an “any lawful use” clause.
Intriguingly, the Times has heard from officials that the new contract does not include that clause, and that it will include a “carve out to ensure that the A.I. model is not used on Americans’ data.” This was Anthropic’s main sticking point around “any lawful use.”
What AI super PAC ad campaigns look like in practice
When you hear that a pro-AI-industry super PAC is running attack ads against a political candidate, you might picture ads rebutting the candidate’s talking points about AI.
But that’s not how super PACs operate in general, and it never really has been. Election ad campaigns are usually optimized for making sure one particular candidate wins or loses, with few other considerations. And the most effective ads rarely have anything to do with the reasons why a super PAC’s funders want a certain outcome.
This can lead to absurdities.
Consider Leading the Future, a super PAC funded in part by a co-founder of militarized AI firm Palantir. When it decided to target U.S. congressional candidate Alex Bores, it did so by accusing him of having built tech used by ICE to track immigrants for deportation. ICE is very unpopular in Bores’s New York district. When did Bores allegedly build software for ICE? In his former job at Palantir.
An investigative piece in today’s Washington Post documents more absurdities. Public First, a super PAC linked to Anthropic that was pitched as a pro-regulation counter to Leading the Future, has been running anti-ICE ads in support of a North Carolina incumbent while simultaneously running anti-immigrant ads in support of a Texas primary candidate. “REMOVE ILLEGALS,” the ads say. “SEAL THE BORDER.”
And in New Jersey’s 5th District, both super PACs are backing the same candidate. I had seen chatter from the start that Public First wasn’t going to be pro-regulation so much as a different flavor of pro-industry super PAC, and I see this news as evidence for that hypothesis.
The takeaway here is that this year’s U.S. midterm elections are being aggressively shaped by different factions of the AI industry sometimes supporting the same candidates, sometimes different candidates, buying ads that don’t have anything to do with AI.
But let me offer you one more reason for cynicism: I don’t think Leading the Future’s campaign against Alex Bores is actually “backfiring” the way much of the AI safety community likes to say it is. Yes, Bores and his supporters have successfully turned the super PAC’s attention into a badge of honor and used this to raise his profile. Yes, he’s now viewed as far more likely to win than he was before Leading the Future got involved.
But even if Bores wins, the media spotlight on Leading the Future’s activity will have done exactly what the super PAC wants: making sure all candidates thinking about promoting AI regulation understand that Leading the Future will come after them for it, using ads that attack them where they are weakest and don’t seem related to AI.
Does anyone think the media will shine a Bores-level spotlight on every candidate so targeted? Without such attention, the ad campaigns won’t backfire; they’ll quietly work, with voters none the wiser. By having made a spectacle of its battle with Bores, the media may have accidentally helped Leading the Future win the war.
The analyses and opinions expressed on AI StopWatch reflect the views of the individual contributors and the sources they cover, and should not be taken as official positions of the Machine Intelligence Research Institute.




