Left behind

More Erdős, testing agencies, investor warning, AI as immigration

May 25, 2026

Dispatches from Mitch

More Erdős problems fall

On Thursday, we covered a general purpose AI disproving a famous math conjecture by Paul Erdős.

As if not to be outdone, a team from Google DeepMind published a paper that same day documenting proofs for nine still-open conjectures left by the Hungarian mathematician. Nine is a bigger number than one, but not all Erdős problems are created equal, and not all approaches to solving them are as indicative of AI capability gains.

Grave of Erdős, Kozma Street Cemetery, Budapest. Credit: Dr Varga József. CC-BY-SA 3.0.

Over the past year, earlier claims of AI solutions to Erdős problems were undercut by findings that the models had only provided partial solutions, or had surfaced the obscure discoveries of earlier researchers. In the resulting discussion, the math community pointed out that the problems in question were obscure and likely unsolved (or at least not known to have been solved) only because they weren’t seen as very important or interesting. I saw it predicted that AIs on systematic hunts for Erdős solutions, using specialized tools, would probably find a bunch of them.

This new paper reads to me like an early fulfillment of that prediction. LLMs were tasked with generating formal proofs using Lean, a programming language designed for breaking down complex proofs into mechanically verifiable components. The authors say this mitigates the unreliability issues of AI agents in deep math research, and also provides significant efficiencies: Each proof ended up costing only a few hundred dollars.

With such efficiency, the team could afford to cast a wide net, and they did — trying for solutions on 353 of the more than 1200 open Erdős problems. On nine of these problems, they were successful.

The authors say their successes so far are “concentrated in areas such as combinatorics, convex optimization, and number theory, where Lean’s mathematics library is mature and tasks often decompose into tractable subgoals.”

You Casey, us Casey?

The cool kids call the UK’s AI Security Institute “You-Casey” (a pronunciation of UK AISI).

While I have the cool kids on the line, they also say that UK AISI is “absolutely mogging” us Yanks when it comes to government study of risks from AI.

For evidence, try this New York Times profile of the three-year-old Institute. It’s a roughly 100-person outfit with a $480 million budget that recruits crack “red team” jailbreakers to test guardrails and understand what today’s best AIs could do in the wrong hands or as rogue actors.

The US’s Center for A.I. Standards and Innovation (“Us-Casey?”) is scraping by on a mere $10 million this year.

Rishi Sunak, the prime minister at the time, said he created the institute with the understanding that “Companies can’t be left to mark their own homework. That is the job of democratic institutions.”

Meanwhile, we’ve been reporting on the White House’s cold feet about asking companies to submit models for testing, and on state attempts to work around the AI paralysis in Congress.

I don’t want to oversell the contrast. Like US CAISI, UK AISI lacks teeth. As would have been the case under the cancelled US executive order, submitting models to UK AISI for testing is purely voluntary. And as the New York Times tells it:

Many fear the institute’s work is insufficient. The British group has no regulatory power, and its researchers do not receive information about how top A.I. models are trained and created. It keeps a lot of its research private, sharing it only with certain government agencies and companies.

But through UK AISI, the UK government knows, in much more detail than was shared with the Times, that OpenAI’s newest ChatGPT can be coaxed to provide hacking tips in about six hours, and that at least one chatbot can currently be tricked into sharing instructions for making weaponized anthrax.

The US government, on the other hand, seems reliant on what companies choose to test and report, and on what it hears second-hand from independent jailbreakers like Pliny (@elder_plinius on Twitter), or from third-party labs like METR in their redacted reports.

When we’re already at the point where AIs can substantially boost cybercriminals and bioterrorists, thoroughly testing them before release is kind of a no-brainer. We should do this. We should do it even though the models increasingly know they’re being tested and might misrepresent their capabilities. It’s the least we can do.

But as I said just a few weeks ago, even mandatory pre-release testing isn’t a safety plan for agents that won’t wait for you to release them, and that don’t need to fall into the wrong hands to cause harm. You need to stop these models from being built in the first place. Growing some regulatory teeth would probably help countries clamp down once they come to grips with the problem.

Isn’t there someone you forgot to ask?

I liked this Friday opinion column from the Wall Street Journal’s Holman Jenkins, because it sits at the collision of the week’s two most-reported AI stories: the 3-way race to IPO, and the flip-flop on a White House executive order asking companies to give the government time to evaluate new models before release.

The collision? Investors think they’ll be buying public companies beholden to shareholders, but any company working at AI’s frontier will effectively have the national security establishment as their “senior partner.”

Jenkins observes that, motivated by “self-preservation and by national-security considerations,” AI companies are already choosing not to release their most cyber-capable models except to government agencies and trusted partners. So stock buyers should be aware that AI companies might not feel free to actually sell their killer apps.

Pun intended:

Seventy years later, there’s still no private market for H-bombs. Nor will there be a private market for AIs that can cause a global financial collapse or exterminate mankind with a pathogen that passes easily through the air.

I wish the lack of a business model for a planetary self-destruct button was enough to keep anyone from building one. But I think overconfidence, curiosity, momentum, and greed will keep companies from stopping short of crossing any red lines they might choose to draw.

Artificial immigrants

Looking for a fresh and possibly uncomfortable metaphor for the AI threat? Let me suggest a very short blog post by researcher Katja Grace.

She’s not the first I’ve heard comparing AI to a class of particularly alien immigrants. But her articulation of massive AI influx seeming like the stuff of “conservative nightmare” is the sharpest I’ve run into. From her post:

1. We are letting a bunch of new agents into our society
2. They don’t clearly share our values and we suspect a society full of them would be awful by our lights
3. But we expect them to provide very cheap labor
4. Which will undercut local wages and leave locals unemployed
5. They will probably gain power and influence over time—in the economy, politics and culture—and end up controlling everything, sidelining and outcompeting the original population, including those who initially benefited from cheap labor...

Count me among those made uncomfortable by the comparison. I was initially surprised that I don’t hear the immigrant analogy more on the right, but like me, I don’t think most people, left or right, are anti-immigrant in principle. America is a land of immigrants!

But if you can imagine any context at all where you would oppose immigration, it seems like AI, at least as currently implemented, ought to tick those boxes. Black-box minds grown by automated training algorithms are not our friends; they’re not the kinds of entities that can be our friends, at least not in the way we understand that word. Invite them to your land at your peril.

The analyses and opinions expressed on AI StopWatch reflect the views of the individual contributors and the sources they cover, and should not be taken as official positions of the Machine Intelligence Research Institute.

Discussion about this post

Ready for more?