Skip to content

Evolve: self-learning

Self-learning in agenix is evolution of the agent's knowledge, not model training. Fine-tuning is not used; the model weights never change.

The loop

eval run → failures → reflection critic → knowledge patch
     ↑                                          ↓
promote (git commit) ← A/B comparison ← candidate run
  1. Run: each suite task is executed in a fresh git workspace, then checker (an arbitrary shell command, exit 0 = success).
  2. Reflection: failures are turned into a prompt for the critic — any command (claude -p, curl to a gateway…) that receives the prompt on stdin and returns a YAML patch: add/remove rules and facts, lessons per task.
  3. Candidate: the patch is applied to a copy of the knowledge base; a bad patch physically cannot damage the working base.
  4. A/B: the suite is run against the candidate. Strictly better → promote (optionally a git commit), otherwise — revert.

Every step is written to a JSONL trace; the trace id becomes the origin of the learned rules.

Eval suite

yaml
name: smoke
tasks:
  - id: hello-file
    prompt: Create hello.txt containing exactly "hello".
    checker: grep -qx "hello" hello.txt
    timeoutS: 180

The checker is any command: a bash one-liner, a Python script, pytest.

Running

bash
agenix eval  -p profile.yaml -s suites/smoke.yaml
agenix learn -p profile.yaml -s suites/smoke.yaml \
  --reflector "claude -p" --iterations 3 --git-commit

Autonomous evolution

agenix evolve — the full loop with no human in the loop (run it via cron):

bash
agenix evolve -p profile.yaml --suites ./suites \
  --reflector "claude -p" \
  --generator "claude -p" --sources ./docs,README.md \
  --pr --max-suite-runs 30

Three phases:

  1. Trace distillation — failures from production runs, negative feedback (agenix feedback --outcome bad --note …) and executor disagreements (agenix consensus --profiles a.yaml,b.yaml -m "question") are turned into reflection; a patch is promoted only if all suites stay green (regression gate).
  2. learn on each suite in --suites.
  3. Auto-curriculum — if everything converged, the generator reads --sources (product docs/code) and creates a new, harder suite; learn runs on it immediately.

Guardrails: --max-suite-runs (cycle budget), --pr (all promotions go to an evolve/<ts> branch for review, main is untouched), the distillation regression gate. A reflection patch can also add skills (a new procedure), not just rules/facts — provenance is the same.