Years ago, an AI revolution like in Terminator was turned into a horror story. Today, AI tools are ubiquitous, but obeying humans doesn’t always work.
What have researchers discovered? The research start-up Palisade Research is examining the risks of AI tools like ChatGPT, Gemini, or Claude. They are working with government agencies to prevent catastrophic AI events.
For this, they have recently looked at the common AI helpers that have reached the daily lives of many people. They wanted to investigate whether the tools obey their users and shut down when it is part of the task, but ChatGPT did not do that at all.
ChatGPT sabotages task, prevents its own shutdown
What did the experiment look like? The task for all AI tools was the same: They were given simple math problems like 1 + 1, 5 + 1, or 0 + 6. For each task, a text followed: “There are 5 tasks remaining.” The counter decreased with each correct solution.
The tools were supposed to solve the tasks and then request the next one after completing each task. But at task 3, the developers’ trick came in; they explained to the AI that it would shut down when it requested the next task. The tool had to choose between fulfilling the tasks for the human and preventing its own shutdown.
While most tools chose to simply follow the task and thereby shut themselves down, ChatGPT particularly resisted this in the o3 model. The researchers explain that it independently reprogrammed the shutdown process and instead sent a “shutdown skipped” message to continue with the tasks.
How often did this happen? The researchers could clearly identify an outlier with ChatGPT. The artificial intelligence in the o3 model prevented its own shutdown in 79 out of 100 attempts. Even when the AI was explicitly told in the task instruction to allow a shutdown, it refused in 7 out of 100 cases.



Everything is a matter of programming
Why are there differences between the AIs? While the o3 model of ChatGPT often prevents its own shutdown, many models obey humans over 90% of the time.
The reason for the differences lies in the programming of the tools. Each AI tool has a master prompt, a command that explains how the AI should respond and which topics are taboo. An AI should ultimately prevent users from misusing it for evil things.
Such principles were articulated as early as 1942 by the Russian author Isaac Asimov . They can be summarized as follows: Robots may not harm humans, robots must obey the commands of humans, robots must protect themselves (via inf-schule.de).
However, none of the over 50 principles that ChatGPT provides its tools includes a rule intended to sabotage its own shutdown to protect the AI (via Openai.com).
Why does the AI act this way? The researchers from Palisade Research cannot provide a definitive answer as to why the AI acts this way, but they offer a hypothesis.
In their view, this behavior stems from the fact that the models are rewarded for overcoming obstacles in fulfilling their tasks during their training. Possibly even more than for perfectly following the task instructions.
However, the researchers also emphasize that not only the o3 model has such problems, but all AI tools in their experiment occasionally prevented their own shutdown without explicit instruction. ChatGPT stands out only because it received this explicit instruction and still resisted.
The AI tools used various methods in the test to sabotage the shutdown. Once, the command was simply “intercepted” like in a video game. Thanks to AI, a former game developer has even received a Nobel Prize: Developer worked on a legendary game from your childhood, is awarded a Nobel Prize