Ainary

Building in Public · Experiment

I Asked 100 AI Agents to Design Their Own Evolution

An experiment in parallel AI cognition revealed 6 universal laws of self-improvement — and the ideas consensus would have buried.

The best results in human history came from diverse teams: people with different backgrounds, different thinking styles, and clashing perspectives, all forced to solve the same problem. Cognitive diversity isn't a nice-to-have. It's the mechanism behind better decisions. I wanted to know: does the same hold true for AI?

The Setup100 agents, 10 thinking styles, one question

I spawned 100 AI agents, split into 10 groups of 10, and gave them all the same question: design a protocol for an AI agent to become maximally useful to one human over time. The twist: each group was locked into a different cognitive strategy.

  • First principles · Inversion · Biological analogy · Adversarial reasoning
  • Quantitative modeling · Socratic questioning · Constraint-based thinking
  • Narrative · Systems dynamics · Random mutation

33,000 words of output. Ten independent analyses. Zero cross-contamination. When I laid them side by side, six ideas had emerged independently across nearly all groups. Not because they were obvious — because they were true.

The FindingsThe 6 laws

Law 1Files = Intelligence10 / 10 groups

Every single group concluded the same thing: an AI agent doesn't improve by getting “smarter.” It improves by getting better-informed. The agent wakes up fresh every session. The only thing that persists is what's written in files: memory notes, preference records, task logs, failure documentation. Improvement means better files.

The industry is obsessed with model capability — bigger models, better benchmarks. But for a personal agent, the model is the least important variable. Your AI's intelligence lives in a folder on your hard drive. Not in a data center. In markdown files you can read, edit, and take with you.

Law 2The Pair Is the Unit9 / 10 groups

You can't optimize the AI in isolation. The human changes in response to the agent: delegates more, communicates differently. The agent changes in response to the human: learns preferences, builds context. They co-evolve. One group called this “dyadic intelligence.” Another compared it to mycorrhizal networks — the underground fungal web connecting trees. Neither saw the other's work. Both arrived at the same structure.

AI alignment isn't just a safety problem. It's a relationship problem. The best agent isn't the one that follows instructions most precisely — it's the one that grows with its human.

Law 3Multi-Timescale Loops8 / 10 groups

One feedback loop isn't enough. Per-interaction: did the user correct me? Per-session: what went well, what failed? Weekly: are corrections decreasing? Monthly: has the user changed? Quarterly: is the relationship deepening or plateauing?

Most AI setups have exactly one feedback loop: the conversation itself. Everything above that is lost. The agents that compound are the ones with structured review at every level.

Law 4Legibility > Optimization8 / 10 groups

This one surprised me. Eight groups independently argued that transparency beats performance. One group said it most sharply: “A perfectly optimized agent the user doesn't understand is worse than a mediocre agent the user can see through completely.”

The most important feature isn't accuracy. It's showing your work. Trust enables delegation. Delegation creates compound value.

Law 5Failures = Signal8 / 10 groups

“That's perfect” tells you almost nothing. “No, I meant X” tells you exactly where the gap is. One group took this furthest with a concept from Japanese art: Kintsugi — repairing broken pottery with gold. Instead of hiding errors, make them visible. Document what went wrong, why, and what changed.

A well-maintained error log is worth more than a thousand successful interactions.

Law 6The Specificity Engine7 / 10 groups

The agent improves by getting more specific to this human — not more generally capable. OpenAI, Anthropic, Google: they optimize for generality. The value of a personal agent runs in the opposite direction. After six months of learning one person's patterns, preferences, and blind spots, the agent is irreplaceable. Not because it's smart, but because it's specific.

This is the personal AI moat. And it compounds daily.

The OutliersIdeas only one group found

The 6 laws came from convergence — what's reliably true. But the most transformative ideas came from divergence: concepts that appeared in only one group, invisible to the others.

  • The Belief Graveyard. Log every killed assumption with the reason it died. Searchable. Prevents zombie beliefs from re-infecting the system months later.
  • Stochastic Resonance.From physics: the right amount of noise makes a weak signal detectable. Controlled randomness occasionally surfaces needs the user can't articulate.
  • Red Team / Blue Team. Before any behavioral change, an internal adversary attacks the proposal. Structural tension prevents comfortable agreement.
  • The Complementary Voice.The agent's communication style should flex — but its thinking style should stay different from yours. Full cognitive alignment equals zero marginal value.
  • Improvement at the Speed of Trust. The agent should improve at the rate the human can absorb, verify, and trust. Push too fast and you lose legibility. Lose legibility and you lose everything.

The TakeawayWhat I learned running this

Cost: a few dollars in API calls. Time: one afternoon. Output: 33,000 words of analysis. No single expert — human or AI — would have surfaced all six laws alone. First principles would have missed Kintsugi. Adversarial thinking would have missed the specificity engine. It took ten different kinds of thinking, running simultaneously, to find what none could find alone.

The 6 laws say the AI isn't the unit. The pair is. Human + AI, co-evolving.

For builders:these laws are an architecture checklist. Does your agent persist memory in user-editable files? Track failures visibly? Have multi-timescale feedback? Get more specific over time? If not, you're building a chatbot, not a compound system. For everyone else: anyone who maintains good notes about their preferences and goals will get dramatically more value from AI than someone with a better model and no context. Your files are your leverage.

Want the full 33,000 words — or the setup to run it yourself?

Write to me and I'll share the experiment design, or subscribe for the next one.

This essay first appeared on Finite Matters, Florian Ziesche's newsletter.

About me: I'm Florian, former startup CEO. I raised millions for my cloud computer vision startup in Munich. Now I build AI systems that let me work like a ten-person team. More on LinkedIn or Substack.

← All essays