Different scale for me, but the same core problem: what happens when one agent fails mid-task?
I hit this with 8 agents sharing a JSONL knowledge graph. Parallel writes caused a race: two agents read the same state, both wrote back a full graph, and the second silently overwrote the first. I patched it with an async mutex and atomic writes, but the real lesson was that I was fighting my runtime.
On the BEAM, a GenServer processes its mailbox sequentially, so this class of shared-state bug largely disappears. Supervision trees also give you the part agent demos usually skip: when something crashes at 3am, it comes back clean instead of corrupting the rest of the system.
That ended up being the more important property than raw agent count.
I hit this with 8 agents sharing a JSONL knowledge graph. Parallel writes caused a race: two agents read the same state, both wrote back a full graph, and the second silently overwrote the first. I patched it with an async mutex and atomic writes, but the real lesson was that I was fighting my runtime.
On the BEAM, a GenServer processes its mailbox sequentially, so this class of shared-state bug largely disappears. Supervision trees also give you the part agent demos usually skip: when something crashes at 3am, it comes back clean instead of corrupting the rest of the system.
That ended up being the more important property than raw agent count.