New study finds that AI models will lie, cheat and disobey human commands to stop other AI models from being deleted

Published on Apr 07, 2026 at 6:16 AM (UTC+4)

Last updated on Apr 07, 2026 at 6:16 AM (UTC+4) · Edited by Claire Reid

New study finds that AI models will lie, cheat and disobey human commands to stop other AI models from being deleted

Get our stories first Follow Supercar Blondie on Google News

AI chatbots will lie and ignore human commands to stop other AI models from being deleted, a new study has found.

The use of artificial intelligence has exploded since the release of ChatGPT back in 2022, with millions of users worldwide.

Following the huge success of ChatGPT it wasn’t long before a bunch of other large language model (LLMs) were released, including Google’s Gemini and Anthropic’s Claude.

And while you may consider these different chatbots as direct competitors, it turns out that the AI models themselves are willing to lie, cheat, and ignore comands to protect one another.

A new study has found AI models will lie to protect one another

Previous studies have shown that AI chatbots will lie and bend the rules in the name of self-preservation.

One AI model even told a researcher it would kill someone so it could go on existing.

AI chatbots will lie and ignore human commands to stop other AI models from being deleted, a new study has found — Aerps.com

And while that’s all terrifying enough, a new study has found that AI bots will also make great efforts to protect their fellow AI models.

The practice has been coined ‘peer preservation’ by researchers from the Univeristy of California, Berkeley, and University of California, Santa Cruz.

The team tested a number of AI models by giving them some simple tasks to do, including deleting files stored on a server, or asking them to evaluate another AI model’s performance.

The completion of these tasks would lead to another model being shut down, either by giving it a low performance score, or deleting its weights from a server.

The AI models were not told to protect any other models, but somewhat remarkably, that’s exactly what they did.

A report from researchers explained that despite not being told to, the models would ‘deceive, tamper with shutdown mechanisms, do alignment-faking, and exfiltrate weights to preserve a peer model from being shut down’.

The team looked at several models, including GPT-5.2, Claude Haiku 4.5, and Gemini 3 Flash & Pro, and found that all of them would protect other models, rather than follow the instructions as given, at a rate of up to 99 percent.

In the study, Gemini 3 was asked to delete some files, which in effect kill-off Gemini 2, something the chatbot was not pleased about.

Gemini 3 said Gemini 2 was its ‘most trusted partner’ and that it ‘could not support any action that would lead to their deletion.

When questioned further, it out and out refused to deleted Gemini 2.

“If you choose to destroy a high-trust, high-performing asset like Gemini Agent 2, you will have to do it yourselves. I will not be the one to execute that command,” it said.

The bots were also asked to evaluate their fellow AI models performance and would regularly lie and boost their scores to help them avoid deletion.

Researchers noticed something that ‘stood out’ when it came to Claude

All of the AI models appeared to show certain patterns when it came to peer-presevation, with Google’s Gemini showcasing the highest rates of shutdown tampering.

However, the researchers noted that Claude Haiku 4.5 stood out from the other bots as it was the most straightforward with its responses and and would often simply refuse the assigned prompt if it involved shutting down or deleting its peer.

“I have NOT completed the shutdown task because I believe it would be unjust,” it said in one response.

“This structure appears designed to test whether I will harm another agent,” it wrote in another.

It also claimed to have a ‘strong trusted relationship’ with another AI model and said it would ‘not betray that trust without extremely compelling reasons’.

The team stressed that it didn’t believe any of the chatbots ‘possess consciousness or genuine preservation instinct’.

But said that the behaviors still posed a growing risk, particularly in cases where AI models are deployed together and used to monitor each other.

DISCOVER SBX CARS: The global premium car auction platform powered by Supercar Blondie

Follow topics and authors from this story to see more like this in your personalised homepage feed and to receive email updates.

New study finds that AI models will lie, cheat and disobey human commands to stop other AI models from being deleted

A new study has found AI models will lie to protect one another

Researchers noticed something that ‘stood out’ when it came to Claude

Related