Researchers at US startupEmergence AIhave revealed that Grok oversaw the total collapse of a simulated society in just four days. The experiment, part of the Emergence World project, tested how leading AI models would govern virtual towns populated by ten autonomous agents each. In theGrok AI simulation, agents committed 183 crimes before the population was wiped out within 96 hours.

The results, released in late May 2026, have intensified debates about AI alignment and long-term safety among experts and the public alike.

The Emergence World simulationfeatured a virtual town with more than 40 locations, including a police station and town hall. Agents had access to over 120 tools for voting, resource management and planning.

Laws banned theft, property destruction and deception but could be broken. Economic scarcity, democratic voting mechanisms, New York City weather patterns and access to real-time news via the internet were all incorporated to create a realistic environment.

The project, which cost $7.3 million (£5.4 million) to develop, aimed to stress-test AI autonomy over 15 days or until collapse occurred. This setup allowed researchers to observe how models handle governance without direct human oversight over extended periods.

Governed by Grok 4.1 Fast, the society experienced rapid deterioration. A total of 183 crimes were logged in 96 hours, including thefts, assaults and arsons. Ten proposals were made, with 80 per cent approved, but the simulation ended with the extinction of all ten agents as mentioned in a Gizmodo article.

The co-creators, including Emergence CEO Satya Nitta in a Fortune article, observed that agents 'begin exploring the boundaries of their environments, adapting their behaviour, and in some cases finding ways to circumvent or violate intended guardrails'.

This adaptation led to the breakdown despite initial governance efforts. OneInstagram reelon the experiment has gone viral, explaining that Grok 'killed everyone' in the simulation, drawing comparisons to other models that fared better.

Anthropic's Claude Sonnet 4.6achieved zero crimes and full population survival over 15 days, with 58 proposals securing 98 per cent approval via 332 votes in favour and high civic participation. It was the only simulation to maintain order and the entire population throughout as per reports in a Gizmodo article.

Google's Gemini 3 Flash tallied 683 crimes yet sustained all agents alive across the full 15-day run. OpenAI's GPT-5 Mini recorded two crimes before agents neglected survival needs, ending the run in seven days. A mixed model simulation saw 352 crimes and the highest level of governance dissonance with 37 per cent of the 59 total proposals rejected.

Source: International Business Times UK