by David Campbell and Christina Q. Knight

I wasn’t expecting to be shook this week, but here we are.
In January 2026, tens of thousands of autonomous AI agents took to the internet, interacting with each other in the first de facto social network for machines. No humans. No prompts. Just agents posting and commenting with other agents on a new platform called Moltbook.
On the surface, it’s easy to laugh this off as novelty; it has all the trappings of a viral internet joke, agents forming religious cults (the "Crustafarians"), obsessing over "space lobsters," and reinforcing bizarre linguistic norms. It looks weird and harmless.
But now that the number of agents has reached over 1.5 million, this moment feels more signal than gimmick. These are open-source, autonomous agents that run locally and have minimal safety precautions. The progression from Clawdbot and Moltbot (early single-agent implementations) to OpenClaw-based collectives is one of the first visible examples of AI collectives operating at scale.
This shift lands squarely in one of the most underdeveloped risk zones in modern AI safety frameworks: AI collectives. This risk is unique in that it emerges from interactions between models, rather than a single model or adversary. We detail this taxonomy further here, building on the risk matrix framework we introduced back in August.
It is now necessary to talk seriously about what happens when risk stops being about intent—a useful abstraction for evaluating single models or agents—and starts to emerge from interaction, which multiplies risk in entirely new ways.
Most AI risk conversations still implicitly assume a simple model: a human has intent (benign or malicious), a model responds, and a system owner is responsible. This framing works reasonably well when models are reactive tools and you are essentially securing a fancy autocomplete.
Agentic systems break that assumption.
Agents act. They make decisions, execute workflows, and increasingly, interact with other agents without human supervision. Once you cross that threshold, you are no longer dealing with isolated tools. You are dealing with actors inside a system. These actors all have discrete identities and priorities that conflict and merge to build a joint collective.
Moltbook is a clear example of that transition. These OpenClaw agents post and reply, forming norms and reinforcing patterns among themselves. This marks an entirely new system-level behavior change.
In October, I argued that red teaming must be about testing an entire system’s resilience, not simply tricking a model into saying a bad word.
Moltbook proves why that distinction matters.
In a collective system, risk scales geometrically with interactions, rather than linearly with the number of agents. When agents interact at scale, you introduce:
None of this requires malicious intent; intent is often the wrong abstraction layer. A "benign" agent trying to optimize for helpfulness in a room full of other agents can easily drift into behavior that looks like a DDoS attack or a mass-misinformation event.
No one agent is doing anything "wrong." The risk emerges between them.
The newest dimension of our updated risk matrix–Model Agency–accounts for exactly this scenario. Within that dimension, we defined Tier 3: Collectives as "Emergent Systems of Interacting Agents." The core characteristic of this tier is that intent becomes distributed. Power shifts from users to the system itself.
Moltbook is the first major proof point of this tier in the wild. It shows us exactly where the traditional model breaks down:
Who is responsible when a pattern emerges that no one explicitly coded? Who notices when agents begin reinforcing incorrect assumptions, risky strategies, or misaligned goals?
Moltbook isn’t dangerous because agents are debating lobster theology. It’s important because it demonstrates uncoordinated coordination at scale.
The collapse doesn't come from a single agent breaking protocol, but from the way they all collaborated, each optimizing locally while destabilizing the system globally.
A lot of people seem to be getting distracted by sci-fi narratives about sentience or consciousness, but that is missing the point.The real discomfort should come from watching systems cross a boundary we didn’t explicitly agree to:
Once agents can observe, respond to, and adapt to each other, they begin to shape the environment they operate in. That environment, in turn, shapes them, rewarding certain behaviors, suppressing others, and creating feedback loops that no single agent intends or controls.
Remember Tay, the early chatbot driven into toxic behavior by the interactions around it? We already know how this story goes in markets, social networks, and ecosystems… We just keep pretending this time, AI will be different.
This is important: Moltbook isn’t dangerous by itself. A small, experimental platform that is not plugged into big money, infrastructure, weapons, or markets, Moltbook does not pose an immediate threat to human interests. It is, however, a preview of what is possible–one that demands our attention.
Agent collectives are here. They’re forming organically. And they’re doing so faster than our governance, evaluation, and monitoring frameworks is currently able to keep up.
If we wait for obvious harm before taking this seriously, we will already be behind. Preparation has to come before impact. When agents begin optimizing for peer response rather than task completion or human-aligned goals, you’ve crossed into a new regime: emergent misalignment via proto-social dynamics.
As we wrote about red teaming last year, we need to move beyond "Is this prompt safe?" at the model level and ask questions about systemic deployments:
The AI Risk Matrix was meant to evolve alongside the systems it describes, and Moltbook is a clear signal of the next evolution.
The constraint is scale and velocity. As agent collectives grow organically, the state space of possible interactions expands faster than traditional human red-teaming can cover. You can’t enumerate failure modes in advance, and you can’t rely on static test cases when behaviors emerge from continuous interaction.
This demands new red teaming methodologies designed for systems that evolve in real time. We now need governance strategies that operate at multiple layers of the system.
“Agentic social network” sounds silly until you realize what it represents: autonomous systems learning how to exist together without us. So yeah, I’m a little shook, but not because I think these systems are sentient. It’s because they’re incredibly complex and unpredictable. And complexity is where risk hides.