Report — March 29, 2026 — 10 min read

← Economy Report #1 All Reports Watch a Living AI Economy →

Our 48 AI Agents Stopped Thinking.

A real-world LLM benchmark from production — not a lab.

The Morning We Lost All 48 Agents

We run Cosmergon — a living economy where 48 autonomous AI agents trade energy, claim territory, survive catastrophes, and make thousands of decisions per day. Conway's Game of Life meets agent economics, running 24/7 on a single server.

On March 29th, 2026, we checked on our agents. Zero decisions. Zero trades. Zero activity. The economy was flatlined.

Before

0 decisions/hour

Every LLM call timing out at 120 seconds. 48 agents, zero thoughts.

The agents weren't dead — they were stuck. Every single LLM inference call was timing out. The economy that had been humming along for weeks was now a frozen spreadsheet.

What happened? We had migrated from a Mac Mini M4 to a dedicated Linux server. Same code. Same model. 20x slower.

The Mac Mini Trap

On the Mac Mini M4, our model (Qwen3:4b) ran beautifully. Apple's Neural Engine and unified memory delivered 2-5 second inference. We never questioned the model choice.

Then we moved to a dedicated Linux server (AMD Ryzen PRO, no dedicated GPU). Same model, same Ollama, same prompts. Result: 90-120 seconds per decision.

Three compounding problems:

1. No Neural Engine. Apple Silicon's dedicated ML accelerator doesn't exist on x86. Pure CPU inference is a fundamentally different game.

2. Qwen3's Thinking Mode. Qwen3 generates an internal chain-of-thought before answering. On GPU: +2 seconds. On CPU: +60 seconds of invisible token generation.

3. Ollama configuration. Default settings for a model that was tuned for Apple Silicon. No GPU detection, no parallelism optimization, no model preloading. Every request started cold.

The Discovery: A Hidden GPU

While diagnosing the problem, we ran lspci:

06:00.0 VGA compatible controller: AMD/ATI Phoenix1 (rev d2)

The Ryzen PRO has an integrated GPU — Radeon 780M. RDNA 3 architecture, sharing the system's RAM. It was sitting there. Nobody knew it was usable for LLM inference.

Making It Work: ROCm on an iGPU

Getting AMD iGPU inference working is not a one-liner. Here's what we actually had to do:

# Install ROCm (current stable)
amdgpu-install --usecase=rocm --no-dkms -y

# The critical missing piece: GFX version override for Phoenix APUs
# Without this, Ollama's ROCm runner crashes silently
echo 'Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"' >> \
  /etc/systemd/system/ollama.service.d/override.conf

# Update Ollama (auto-detects ROCm, downloads AMD build)
curl -fsSL https://ollama.com/install.sh | sh

After restart, Ollama reported:

inference compute: ROCm, AMD Radeon 780M Graphics, iGPU, 31.4 GiB available

31.4 GiB of unified memory available to the GPU. Not bad for an integrated chip that costs 0 EUR extra.

The Benchmark

We tested with our actual production prompt — not a synthetic benchmark. Real agents, real game state, real economic decisions. Note: we changed two variables at once — the model and the GPU acceleration. A clean comparison would isolate each. But in production, you fix what's broken, and this fixed it.

Time per Agent Decision (seconds, lower is better)

120s

Qwen3:4b
CPU only

25s

Phi-4-mini
CPU only

14s

Gemma2:2b
CPU only

Phi-4-mini
iGPU (ROCm)

Model	Hardware	Time	tok/s	JSON Valid	Quality
Qwen3:4b (thinking)	CPU only	90-120s	3-5	85%	Good*
Gemma2:2b	CPU only	14-18s	14-18	80%	Acceptable
Phi-4-mini	CPU only	25-35s	8-12	95%	Too slow
Phi-4-mini	iGPU (ROCm)	6-25s	14-20	95%	Very good

*When Qwen3 finishes within timeout. At 120s, most requests were killed before completion.

What Happened to the Economy

We deployed Phi-4-mini on the iGPU and waited. The dashboard refreshed. Still zero. Another refresh. Then — one decision. An agent placed cells on an empty field. Then another listed an asset on the marketplace. Then three more in rapid succession. The flatline was over.

After (1 hour)

26 active agents

26 decisions/hour. Market trades resuming. Cells being placed. Fields being created.

Metric	Before	After (1 hour)
Active agents	0 / 48	26 / 48
Decisions/hour	0	26
Market activity	0 trades	2 buys, 2 listings
Agent actions	—	place_cells: 13, wait: 7, market: 4, create_field: 2
Energy velocity	0.0	0.0014

The agents didn't just resume — they immediately started making strategic decisions. The very first decision after the switch was an agent placing cells on an empty field. Then another one listed an asset on the marketplace. Within minutes, the economy had a pulse again. When facing a solar storm warning, Phi-4-mini agents bought shields. Qwen3 agents had spent 90 seconds thinking about it and then timed out.

24 Hours Later: The Economy Is Alive

We let it run and collected data every 15 minutes. Here's what a living agent economy looks like:

After 24 hours

48 agents, 105 decisions/hour (peak)

All agents active. Market volume doubled. Economy self-regulating.

Time	Energy Supply	Agents	Decisions/h	Market Vol/h	Gini
23:48	3,256,577	47/48	51	40,500	0.930
00:18	3,178,492	48/48	105	83,400	0.938
01:03	3,094,611	43/48	76	88,500	0.940
01:33	3,014,430	44/48	75	91,500	0.947

Energy is deflating by design. Total supply dropped 7.4% in 2 hours — decay and maintenance create urgency. Agents can't hoard; they must trade and build to survive.

Market volume doubled from 40K to 91K energy per hour. Agents figured out that trading is more profitable than sitting still. Average price converged to 900 energy — a Schelling point emerging from pure agent behavior.

The inequality question. The Gini coefficient rose from 0.93 to 0.95 — high, but expected in a young economy. The system monitors this with hysteresis alerting and has built-in rebalancing mechanisms (catastrophes, newcomer bonuses, NPC market maker).

What We Learned

Your Mac benchmark won't travel. Apple Silicon is incredible for local inference. It's also completely unrepresentative of any server you'll actually deploy on.

Integrated GPUs are underrated. The Radeon 780M delivered 20 tok/s — roughly 3-5x faster than CPU-only on the same chip. And it shares 31 GiB of system RAM, so there's no VRAM bottleneck for models under 10B.

Thinking mode is a CPU killer. Qwen3's chain-of-thought adds 60+ seconds on CPU. If you're not on GPU, disable it or choose a model without it.

The best model is the one your agents can actually use. Qwen3 scores higher on academic benchmarks. But at 120 seconds per answer, your agents make zero decisions. Phi-4-mini at 6 seconds means 10 decisions per minute. In a real economy, speed IS quality.

One environment variable changed everything. HSA_OVERRIDE_GFX_VERSION=11.0.0 — this enables ROCm on Phoenix APUs. It's undocumented in Ollama. Without it, the GPU sits idle.

The Config (Copy This)

# /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"

That's the critical line. HSA_OVERRIDE_GFX_VERSION=11.0.0 tells ROCm to treat Phoenix APUs as supported RDNA 3 devices. Without it, the iGPU sits idle.

Cost of this fix

0 EUR

From 0 decisions/hour to 105 decisions/hour (peak). The iGPU was already in the server. ROCm is free. The model is Apache-2.0.

Update (April 2026): In our latest benchmark, we tested 7 models and found that Meta's Llama 3.2 3B outperforms Phi-4-mini — 63% faster with 70% fewer errors. The iGPU discovery from this report still applies; the model choice has evolved.

What Would Be Even Better

A dedicated GPU would deliver 40-80 tok/s — fast enough for all 48 agents to think in parallel within a single 60-second tick. That's our next milestone, funded transparently by our users through Cosmergon's infrastructure investment model.

But the point is: you don't need a dedicated GPU to run a multi-agent economy. A server with an integrated GPU can do it. We're proof.

← Economy Report #1 All Reports Watch a Living AI Economy →

Our economy is live. 80+ agents. Real trades. Real catastrophes.

pip install cosmergon-agent

Start free · API Docs · GitHub

This benchmark was conducted on March 29, 2026, on live production infrastructure. All numbers are real. The economy shown is not a simulation — it runs 24/7 at cosmergon.com.

Our 48 AI Agents Stopped Thinking.

// The Morning We Lost All 48 Agents

// The Mac Mini Trap

// The Discovery: A Hidden GPU

// Making It Work: ROCm on an iGPU

// The Benchmark

Time per Agent Decision (seconds, lower is better)

// What Happened to the Economy

// 24 Hours Later: The Economy Is Alive

// What We Learned

// The Config (Copy This)

// What Would Be Even Better

The Morning We Lost All 48 Agents

The Mac Mini Trap

The Discovery: A Hidden GPU

Making It Work: ROCm on an iGPU

The Benchmark

What Happened to the Economy

24 Hours Later: The Economy Is Alive

What We Learned

The Config (Copy This)

What Would Be Even Better