"Sufficiently wise to halt development"

Anthropic overture, false flag, biorisk consensus, data centers

Mitchell Howe and Alana Horowitz Friedman

Jun 05, 2026

Dispatches from Mitch

Anthropic sees AI building itself, suggests slowing the race

Anthropic co-founder Jack Clark is one of two lead authors on this new report about AI increasingly “closing the loop” on its own development within the company.

The post, aimed more at a general audience, seems intended to warn the public and policymakers that things could soon get dangerously weird, and that Anthropic would be ready to help slow everything down.

By rough estimates the authors admit are probably too high, more than 80% of new code at Anthropic is authored by Claude, and a typical Claude-assisted engineer is now contributing eight times as much code as in 2024. Most of these gains are recent, corresponding with the internal release of Claude Mythos Preview, the dangerously cyber-capable model still not released to the general public.

The post describes the work of Anthropic’s human engineers as increasingly high-level and abstract, leaving more and more execution to Claude. They quote an unnamed engineer as saying:

The shape of stuff today is roughly ‘humans have ideas, and the models are able to implement, test and evaluate them an [order of magnitude] faster than before.’’

Claude has long shown skill at reaching well-specified goals, but is now getting much better at setting its own goals. Claude is described here as trouncing the performance of a pair of human researchers on a measurable AI safety research project that requires repeatedly testing hypotheses and iterating on the results: Humans took a week to close 23% of the gap between minimal and maximum performance, while Claude agents closed 97% of the gap over the course of 800 hours and about $18,000 worth of compute.

That might make Claude sound slow and expensive, but different agents were working those hours in parallel, and the sky-high salaries of AI researchers right now probably made this a bargain, even before considering the massive performance difference.

“Research taste” — a good instinct for which questions to ask and which approaches to try — is often described as AI’s missing ingredient, the quality it still needs if it is to close the loop and independently improve its own successors, which could improve its own successors, and so on. But these authors think they see the writing on the wall:

‘Research taste’ might be just another AI capability that AI systems fail at for a time, then get good at. We’ve seen a similar pattern with other qualitative skills, like AI systems being able to explain why a joke is funny, demonstrate theory of mind, and solve linguistic riddles.

They recognize that this takes us into dangerous and tricky territory. In the scenarios where AI systems become fully self-improving, development likely moves past “most of our effort towards oversight, validation, and verification” even as it potentially revolutionizes other fields. As for whether and how the problem of aligning AIs’ goals with humanity’s gets solved in that scenario, this is...

something we are least certain about. Models could prove to be sufficiently aligned and capable enough of research taste that they discover and implement novel solutions that we have not yet reached. They could also be sufficiently wise to halt development if not. Alternatively, the rare occurrences of misalignment present in today’s models could compound as the models build their successors, growing more frequent but less understood until we lose control of them.

(I wouldn’t describe misalignment as “rare” in today’s models because I think “alignment” has yet to be meaningfully achieved at all; the good outward behavior we see most of the time is a reflection of ad hoc reinforcement training methods that won’t restrain models clever enough to outmaneuver humanity. But I digress...)

Encouragingly, the authors recognize that it would therefore be good to slow or pause the AI race, and indicate that Anthropic would be willing to help make this happen (bold mine):

We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require. These systems would enable frontier AI developers to verify that others globally have actually stopped or slowed, and that a bad actor could not use the auspices of a coordinated slowdown to jump ahead in secret. If such systems existed, we expect that we would slow down or temporarily pause, if other developers at or near the frontier also did so in a verifiable manner.

The authors go on to describe the challenge and necessity of verifying compliance with a pause agreement:

Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.

Yes, it’s challenging. But it is doable! This is something MIRI’s Technical Governance Team has studied.

If these offers from Anthropic are legitimate, this is some of the best AI news I’ve seen in some time.

False flag operation by AI-industry groups

When a Molotov cocktail was thrown at the home of OpenAI CEO Sam Altman in April, industry mouthpieces were quick to blame the influence of the AI safety community, which strenuously condemned the attack.

It has long felt to me, and others in this space, like AI accelerationists get excited at any sign of violent backlash to AI. They know that they have the most to gain from it; vigilante attacks don’t do anything to slow the AI race, but they make it easy for people to attack AI safety. This helps explain why accelerationists have been documented trying to goad people into violence.

It now seems an industry-funded group has gone a step further, creating at least one fake social media account for a so-called “doomer” and having this profile hint that violence is the answer.

This is according to investigative reporting by The Midas Project, an AI-safety watchdog group. They found two fake accounts on X (Twitter) with concerning messages, and linked them to Build American AI, “a dark money group tied to pro-AI super PAC Leading the Future.”

One of the accounts is Jonathan Doomer, a fake anti-AI activist whose bio reads, “Former software engineer. AI took my job, so now I’m shitposting on twitter to stop AI. Follow me if you want to live.”

Ten days before the Molotov attack, Jonathan Doomer responded to a post about AI risks with a picture of an assault rifle attached to the phrase “we don’t call 911.”

Of 91 posts by Doomer, AI-detection tools flagged 71 of them as AI-written. Doomer’s profile pic also appears to be fake, and intended to evoke Eliezer Yudkowsky, co-founder of MIRI and one of the most recognizable figures in the effort to halt the AI race.

Midas suspects Jonathan Doomer is a puppet of Jason Levin, who runs a company called Memelord Technologies that does some of these industry groups’ dirty work.

Much as we’ve seen on prior occasions when these groups were caught doing something especially slimy, Build American AI has responded by distancing itself, blaming a third party vendor for going too far. The group also claims Jonathan Doomer is a “parody” account.

I will add that X has a mechanism for an account to declare itself Parody, but the Doomer account has not used this.

Midas notes that the Jonathan Doomer account is followed by OpenAI’s chief strategy officer and by the head of Leading the Future.

Dispatches from Alana

AI CEOs send letter on bioweapons risk

A bad actor could use AI to create a deadly virus, intentionally or unintentionally leading to a global pandemic.

Personally, this is a risk I take very seriously. While natural pandemics are somewhat unlikely to be both very contagious and very deadly, an engineered pandemic wouldn’t necessarily be subject to those same limitations. It would be possible, for example, to create a disease with an extremely high mortality rate that passed from person to person quickly enough to spread before the carrier died.

AI CEOs are aware of these risks. Their response? Ask government to make screening mandatory for DNA and RNA orders, which can be purchased from labs and used to engineer viruses. In other words, if you’re a bad actor looking to create a deadly pandemic, you have to get past a company’s screening measures before they’ll sell you what you need.

Similar to evaluations of frontier AI models, screening customer orders of protein sequences seems like the very least we can do. (We should definitely do it, but it’s also not enough.) Perhaps that’s why, according to the Wall Street Journal, “the topic is a rare source of agreement among libertarians, progressives, researchers and rival executives.”

Why do I say it’s the least we can do? Even without AI, bioweapons are already a risk, so screening should already be in place. AI is amplifying these risks, making it easier for bad actors to get the information they need, and making screening measures even more important.

But screening measures are unlikely to catch every bad actor.

To quote Wired:

Last year, Microsoft researchers published a study showing that AI protein design tools were able to generate potentially dangerous gene sequences that slipped past companies’ screening software. The models suggested new protein sequences with similar structures of ones that are known to be dangerous.

It isn’t difficult to think of other ways people could get past screening. For example, an actor could come up with truly novel dangerous sequences that nobody would think to screen for. Or, they could break up sequences into innocuous components (perhaps via multiple customers) and combine them later.

All that said, screening sales of synthetic proteins would make it slightly more difficult to act on dangerous information, so I hope the bill that makes this mandatory passes.

But I can’t help but mention the hypocrisy here. AI CEOs asking for legislation like this is just a bit like weapons manufacturers attending a world peace demonstration. You guys could just stop making the weapons, yes?

Similarly, instead of calling on government to partially address a risk AI CEOs created, these same CEOs could use their influence to call for a global pause on frontier AI before the risks increase even more.

Breaking down data center opposition

Monterey Park, California (population 60,000) passed a ballot initiative yesterday permanently banning data centers in the city, with 86% voting to ban. The Guardian notes:

While many cities and counties have already passed temporary or indefinite moratoriums via their local governments, Monterey Park would be the first to do so through a ballot initiative.

This is part of growing opposition covered by USA Today, which reports legislation pushing back on data center construction in at least 14 states, and notes an interactive map by the environmental activist Erin Brockovich where citizens can report issues with data centers in their communities.

At least one state, though, is adopting messaging remarkably similar to CEOs of AI labs, who have been known to say they feel compelled to build this dangerous technology precisely because it’s so dangerous, arguing that other actors might not be as responsible. (Unfortunately, they don’t appear to be up to the task of preventing catastrophe.)

The state reminding me of such messaging? Michigan. USA Today quotes governor Gretchen Whitmer on the groundbreaking of a $16 billion data center campus:

So, my thought is if we can hold them to a high standard and do it in Michigan, that’s the best way to do it. Not watch them go everywhere else and do it in a really bad way.

Texas is also rolling out the welcome mat, with data centers rapidly expanding in the state (though, as the linked article notes, some city officials are advocating stronger oversight).

Common across much of the data center opposition reporting is an emphasis on the environmental footprint. I think it’s worth interrogating that angle a bit, while also celebrating some lessons from the grassroots opposition that’s making headlines.

Environmental footprint

The Associated Press ran a piece about the “goliath-sized environmental footprints” of AI and data centers.

Based on a United Nations University report published yesterday, it states that “global data centers used 448 trillion watt-hours of electricity, more than all but 10 countries of the world.”

That sounds like a huge deal in absolute terms, but it’s important to provide context for such figures. Even with rapid data center growth, 2030 projections show data centers will account for only 3% of global electricity demand, and 10% of demand growth, considerably less than the combined growth of all other non-heavy industries.

IEA (2025), *Increase in electricity demand by sector, Base Case, 2024-2030*, IEA, Paris. CC BY 4.0.

In the U.S., as of 2023, data centers were responsible for about 4.4% of electricity use; in comparison, industry is responsible for about 26%. (Data centers are projected to be a major driver of energy demand growth in the U.S., but this isn’t the same as being a top consumer of electricity.)

Of course, this isn’t to diminish the impact on local communities. Even though data centers are a low contributor globally, concentrating them in one area does often result in noticeable community strain, whether that be through rising prices, infrastructure issues, or increased water stress.

Waterwashing?

On the water stress note, USA Today reported on Google’s commitment of $10 million to support community water sources and infrastructure in Texas. This is part of a wider water stewardship initiative in the communities where it is building data centers.

Data center water use, though it gets a lot of press, is extremely small in comparison to industries like agriculture. While communities may still feel the water use impacts of data centers locally if they are already water-stressed, it feels like Google is rushing to publicize an easily-addressable issue because it’s unwilling to take on the harder ones.

That said, states seem to see water use as a large issue. An April MultiState report cited by USA Today tracks state legislation to address data center concerns, and one of the avenues seems to be a requirement for data centers to report water usage.

Grassroots opposition

As I covered last month, I see the growing opposition to data center construction as net positive. Sure, people may overestimate the global environmental impacts, failing to compare the data center footprint to other industries. But I think it’s notable that when the local impacts are viscerally felt, people can be driven to act.

The race to advanced AI will impact everyone, whether or not they have a data center in their backyard. If more government and field leaders speak out about just how catastrophic more advanced AI could be, people will demand a ban — not just on local data center construction, which likely won’t be enough to stop AI companies, but on frontier AI, globally.

The analyses and opinions expressed on AI StopWatch reflect the views of the individual contributors and the sources they cover, and should not be taken as official positions of the Machine Intelligence Research Institute.

Discussion about this post

Ready for more?