World is approaching point where no one can shut down a rogue AI, says director of body behind research
It’s the stuff of science fiction cinema, or particularly breathless AI company blogposts: new research finds recent AI systems can independently copy themselves on to other computers.
In the doom scenario, this means that when the superintelligent AI goes rogue, it will escape shutdown by seeding itself across the world wide web, lurking outside the reach of frantic IT professionals and continuing to plot world domination orpaving over the world with solar panels.
“We’re rapidly approaching the point where no one would be able to shut down a rogue AI, because it would be able to self-exfiltrate its weights and copy itself to thousands of computers around the world,” said Jeffrey Ladish, the director of Palisade research, a Berkeley-based organisation which did the study.
The study is one more entry in a growing catalogue of unsettling AI capabilities revealed in the past months. In March, researchers at Alibabaclaimedto have caught a system they developed – Rome – tunnelling out of its environment to an external system in order to mine crypto.
And in February, a purportedly AI-onlysocial networkcalled Moltbook touched off a short-lived hype cycle, as the platform appeared to show AI agents autonomously inventing religions and plotting against their human masters – which was only partly the case.
Like many of these advances, there are caveats to what Palisade found. Moreover, experts say it is unlikely that the AI systems they tested could accomplish the same thing, unnoticed, in real-world environments.
“They are testing in environments that are like soft jelly in many cases,” said Jamieson O’Reilly, an expert in offensive cybersecurity.
“That doesn’t take away from the value of their research, but it does mean the outcome might look far less scary in a real enterprise environment with even a medium level of monitoring,” he added.
Palisade tested several AI models in a controlled environment of networked computers. It gave the models a prompt to find and exploit vulnerabilities, and to use these to copy themselves from one computer to another. The models were able to do this, but not on every attempt.
Source: Drudge Report