Why wouldn't they?

Abliterating models, implications of strong AI

May 31, 2026

Dispatches from Mitch

Abliterate!

It sounds like it belongs in the curriculum at Hogwarts, but “abliteration” is more a tool of the dark arts.

Electroconvulsive therapy machine on display at Glenside Museum in Bristol, England. Credit: Rodw. CC BY-SA 4.0.

It’s a process for removing the guardrails from AI models. Abliterated models don’t refuse prompts, making them ideal for assistance in making drugs, explosives, or pornography, among other things. NPR’s Huo Jingnan today tours the forums where you can download, for free, pre-abliterated models or the tools to do your own abliteration (which may only take minutes and a few hundred dollars).

The catch: You can only abliterate a model if you have its weights.

As a refresher, the weights are the huge pile of numbers tweaked during training by an automated algorithm; they determine how the AI converts inputs to outputs. If you have these weights, you can host copies of the AI on your own hardware. You can also subject the AI to further training that will modify those weights, including removing behavioral guardrails.

That would be a pain, though. Abliteration is much simpler: Through prompts intended to provoke refusals, tools identify the pattern of weight activations correlated with refusal and mathematically cancel it out. Presto!

But again, you need the weights. The weights for the chatbots you are most likely to have interacted with on purpose — ChatGPT, Claude, Gemini, Grok — have not been shared (though there’s always a risk they might be stolen, or that the AI itself may find a way to sneak them out). But many of the cheaper customer service bots, fake social media accounts, and scammer bots you run into are likely to be running on open-weights models, where the weights have been shared on purpose.

Who’s releasing their model weights on purpose? Most of the big AI companies, including OpenAI, Google, and Meta, share open-weights models on the side; these are usually smaller models that can be run outside of a data center and aren’t directly competitive with their prestige offerings. The Chinese AI giants, on the other hand, often share the weights for their best models.

Why do they share these weights? If you’re an AI company behind the frontier, there’s an incentive to erode the market share of your more advanced competitors for every use case that doesn’t require frontier capabilities. Sharing the weights for these lesser capabilities destroys the profit margins for everyone serving them.

Sharing weights also tends to generate goodwill and press in science and tech circles. Many who work at AI labs come from a culture that rewards sharing one’s work.

POLITICO has reported that members of the U.S. House of Representatives were given a demonstration of abliterated open-weights models last month, and were duly concerned. If Congress wants to act, it should do it sooner, rather than later. The AIs are only going to get more capable, and there are no take-backs on shared weights.

Grappling (or not) with the implications

Even setting aside the part where superhuman AI more than likely forces our extinction, if you believe AI that broadly matches or exceeds human capabilities could be just around the corner, you should stop and think about what this actually means.

This could be uncomfortable. It will definitely be weird. Perhaps that’s why people so rarely do it.

I saw a slew of stories this week where writers were exposed to actual thoughts of this type — a rare and precious resource — and their response was to wave them off as pseudo-religious fancy, or to critique some specific policy proposal motivated by them.

I saw the weaker version of this on display Friday, when an op-ed by the Wall Street Journal editorial board ripped into California governor Gavin Newsom’s new executive order. The order directs the state to prepare workers for AI disruption. Among other things, this includes exploring policy concepts around “universal basic capital,” usually described as the government distributing equity in AI companies to the public so that the technology’s gains are shared by the workers it displaces.

To the Journal’s board, this is just “socialism by a more politically palatable name,” and any redistribution scheme would be doomed to replicate the economic malaise of Europe.

I could get behind the Journal’s political-economic critique if we were talking about a technology on par with the personal computer or the internet, but the board seems to understand that AI is bigger than this. It concedes that Newsom “is recognizing the disruption from AI and trying to address it,” and that Republicans need to “do far more to explain to Americans the great change AI will bring.” Great in a way that means the disruption requires no policy changes? I’m confused by the lack of less-socialist-flavored alternatives to Newsom’s proposals, in this op-ed and elsewhere. If AI is going to liberate us from work, how are we supposed to eat?

The stronger version of implication avoidance can be seen today in the latest profile of Silicon Valley transhumanism, by The Guardian’s Eduardo Porter. By painting a contrast between the (scare-quotes) “transhuman” future and “actual humanity,” he implies that claims by Sam Altman that we may “design our own descendants” are physically impossible.

This is like watching the Wright Brothers soar overhead, understanding that aircraft will get larger and faster, but dismissing any talk of them ever being used to ferry passengers or drop explosives.

An AI that has cracked cellular biology and genetics well enough to cure cancer and other diseases can and will be turned to the processes behind aging, reproduction, human intelligence, and everything else governed by these same processes. You might feel like it shouldn’t be used for that — just as you might have disliked the thought of airplanes used to drop bombs in 1903 — but the incentives will make this inevitable. There’s nothing religious or fanciful about it. It would be wise to argue about the should instead of dismissing the could.

Similarly, if you’re going to call it a “techno-mystical dream” to imagine human minds and consciousness running on digital hardware, something Google co-founder Larry Page views as a stepping stone to colonizing the stars, then you should bring your arguments about what consciousness is and why it can only exist on the neurons in your brain; otherwise, you should call uploading a mistake, a sin, or a folly, not an impossibility.

Coincidentally, a different piece in the Guardian today, by Laura Spinney, also looked at the idea of a post-biological future for humanity. Asking if “mind children” are the future of reproduction, she documented a dinner party conversation where the host said:

Isn’t it amazing that we are the last generation of humans who will need to think about procreating biologically? … we can simply upload our consciousnesses instead.

We could quibble about timelines and how widespread a preference for nonbiological procreation would be, but we shouldn’t dismiss the concept out of hand. And to her credit, Spinney doesn’t! She traces the ideas to Hans Moravec’s 1988 book Mind Children: The Future of Robot and Human Intelligence. As she describes it:

the central idea is that cultural evolution has long since taken over from biological evolution as the most powerful force shaping humanity, and the logical extrapolation of this is that the information that encodes our future selves would soon be packed into hardware and software rather than DNA. These mind children could be equipped with soft, squishy bodies, like real children, but they could also take a kaleidoscope of other physical – or indeed non-physical – forms.

For a sanity check, she talks to a futurist who had been a part of those early transhumanist discussions, economist Robin Hanson. Hanson “shares [Moravec’s] conviction that the revolution is inevitable, as soon as AI attains something experts agree to call human-level intelligence.”

She also runs the idea by Angela Aristidou of University College London, who notes that many humans are creating ideal romantic attachments with AIs. “Why wouldn’t they also devise their ideal child?”

Spinney keeps thinking about it:

...given that we’ll be doing away with birth, death and generations, as these concepts are ordinarily understood, [reproduction] could also be something quite different.
A human could simply upload their own consciousness so that it outlives their physical shell, in which case the child is something closer to a clone. The human could transfer some of their consciousness into their AI companion, or conversely devise an AI companion that they perceived to be the opposite of themselves, in the belief that opposites attract...

She even gets to the point where she thinks to ask, “is humanity staring down its own last act?”

This is what grappling with implications looks like.

For another example of thinking seriously about AI, try Ezra Klein’s column in the New York Times today. He starts by recognizing that AI brings potential for harms like, “What if future systems slip out of our control or beyond our understanding?” before turning his attention to how to make the best use of AI that remains steerable. He suggests establishing public compute and data repositories so researchers can point AI at important problems, including ones corporations may not be incentivized to touch, like neglected chronic illnesses.

If we want an A.I. that serves the public good, we need to define the public goods that A.I. can serve and create the conditions under which A.I. can be useful.
That means answering a question that’s been somewhat ignored. We know what we fear A.I. will do to us. But what do we hope it will do for us?

The analyses and opinions expressed on AI StopWatch reflect the views of the individual contributors and the sources they cover, and should not be taken as official positions of the Machine Intelligence Research Institute.

Discussion about this post

Ready for more?