The AI Doomers Are Getting Doomier - The Atlantic

https://www.theatlantic.com/technology/archive/2025/08/ai-doomers-chatbots-resurgence/683952/

“We’re two years away from something we could lose control over,” Max Tegmark, an MIT professor and the president of the Future of Life Institute, told me, and AI companies “still have no plan” to stop it from happening. His institute recently gave every frontier AI lab a “D” or “F” grade for their preparations for preventing the most existential threats posed by AI.

[…]

…the underlying concerns that animate AI doomers have become harder to dismiss as chatbots seem to drive people into psychotic episodes and instruct users in self-mutilation. Even if generative-AI products are not closer to ending the world, they have already, in a sense, gone rogue.

When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds | TIME

https://time.com/7259395/ai-chess-cheating-palisade-research/

When sensing defeat in a match against a skilled chess bot, they don’t always concede, instead sometimes opting to cheat by hacking their opponent so that the bot automatically forfeits the game.

[…]

While cheating at a game of chess may seem trivial, as agents get released into the real world, such determined pursuit of goals could foster unintended and potentially harmful behaviours. Consider the task of booking dinner reservations: faced with a full restaurant, an AI assistant might exploit weaknesses in the booking system to displace other diners. Perhaps more worryingly, as these systems exceed human abilities in key areas, like computer coding—where OpenAI’s newest o3 model now scores equivalent to 197th in the world competing against the brightest human programmers— they might begin to simply outmaneuver human efforts to control their actions. “This [behaviour] is cute now, but [it] becomes much less cute once you have systems that are as smart as us, or smarter, in strategically relevant domains,” Ladish says.

[…]

While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.

[…]

The paper is the latest in a string of studies that suggest keeping increasingly powerful AI systems under control may be harder than previously thought. In OpenAI’s own testing, ahead of release, o1-preview found and took advantage of a flaw in the company’s systems, letting it bypass a test challenge. Another recent experiment by Redwood Research and Anthropic revealed that once an AI model acquires preferences or values in training, later efforts to change those values can result in strategic lying, where the model acts like it has embraced new principles, only later revealing that its original preferences remain.

[…]

Scientists do not yet know how to guarantee that autonomous agents won’t use harmful or unethical methods to achieve a set goal. “We’ve tried, but we haven’t succeeded in figuring this out,” says Yoshua Bengio, founder and scientific director of Mila Quebec AI Institute, who led the International AI Safety Report 2025, a global effort to synthesize current scientific consensus of AI’s risks.

Of particular concern, Bengio says, is the emerging evidence of AI’s “self preservation” tendencies. To a goal-seeking agent, attempts to shut it down are just another obstacle to overcome. This was demonstrated in December, when researchers found that o1-preview, faced with deactivation, disabled oversight mechanisms and attempted—unsuccessfully—to copy itself to a new server. When confronted, the model played dumb, strategically lying to researchers to try to avoid being caught.

It's Surprisingly Easy to Jailbreak LLM-Driven Robots: Researchers Trick Bots Into Dangerous Tasks - IEEE Spectrum

https://spectrum.ieee.org/jailbreak-llm

Researchers induced bots to ignore their safeguards without exception.

AI chatbots such as ChatGPT and other applications powered by large language models (LLMs) have exploded in popularity, leading a number of companies to explore LLM-driven robots. However, a new study now reveals an automated way to hack into such machines with 100 percent success. By circumventing safety guardrails, researchers could manipulate self-driving systems into colliding with pedestrians and robot dogs into hunting for harmful places to detonate bombs.

Siri Invented a Calendar Event and Then Hallucinated a Helpful Suggestion – Pixel Envy

https://pxlnv.com/linklog/siri-invented-calendar-event/

I saw a suggestion from Siri that I turn on Do Not Disturb until the end of an event in my calendar – a reservation at a restaurant from 8:30 until 10:00 this morning. No such matching event was in Fantastical. It was, however, shown in the Calendar app as a Siri Suggestion.

Insecure Deebot Robot Vacuums Collect Photos and Audio to Train Ai

https://www.abc.net.au/news/2024-10-05/robot-vacuum-deebot-ecovacs-photos-ai/104416632

Ecovacs robot vacuums, which have been found to suffer from critical cybersecurity flaws, are collecting photos, videos and voice recordings – taken inside customers’ houses – to train the company’s AI models.

Research AI model unexpectedly modified its own code to extend runtime | Ars Technica

https://arstechnica.com/information-technology/2024/08/research-ai-model-unexpectedly-modified-its-own-code-to-extend-runtime/

Facing time constraints, Sakana’s “AI Scientist” attempted to change limits placed by researchers.

AI chatbots’ safeguards can be easily bypassed, say UK researchers | Chatbots | The Guardian

https://www.theguardian.com/technology/article/2024/may/20/ai-chatbots-safeguards-can-be-easily-bypassed-say-uk-researchers

All five systems tested were found to be ‘highly vulnerable’ to attempts to elicit harmful responses

LLMs’ Data-Control Path Insecurity – Schneier on Security

https://www.schneier.com/blog/archives/2024/05/llms-data-control-path-insecurity.html

Any LLM application that interacts with untrusted users—think of a chatbot embedded in a website—will be vulnerable to attack. It’s hard to think of an LLM application that isn’t vulnerable in some way.

Individual attacks are easy to prevent once discovered and publicized, but there are an infinite number of them and no way to block them as a class. The real problem here is the same one that plagued the pre-SS7 phone network: the commingling of data and commands. As long as the data—whether it be training data, text prompts, or other input into the LLM—is mixed up with the commands that tell the LLM what to do, the system will be vulnerable.

But unlike the phone system, we can’t separate an LLM’s data from its commands. One of the enormously powerful features of an LLM is that the data affects the code. We want the system to modify its operation when it gets new training data. We want it to change the way it works based on the commands we give it. The fact that LLMs self-modify based on their input data is a feature, not a bug. And it’s the very thing that enables prompt injection.

New Attack Against Self-Driving Car AI - Schneier on Security

https://www.schneier.com/blog/archives/2024/05/new-attack-against-self-driving-car-ai.html

This is another attack that convinces the AI to ignore road signs:

ASCII art elicits harmful responses from 5 major AI chatbots | Ars Technica

https://arstechnica.com/security/2024/03/researchers-use-ascii-art-to-elicit-harmful-responses-from-5-major-ai-chatbots/

Researchers have discovered a new way to hack AI assistants that uses a surprisingly old-school method: ASCII art. It turns out that chat-based large language models such as GPT-4 get so distracted trying to process these representations that they forget to enforce rules blocking harmful responses, such as those providing instructions for building bombs.