Evolve: self-learning
Self-learning in agenix is evolution of the agent's knowledge, not model training. Fine-tuning is not used; the model weights never change.
The loop
eval run → failures → reflection critic → knowledge patch
↑ ↓
promote (git commit) ← A/B comparison ← candidate run- Run: each suite task is executed in a fresh git workspace, then
checker(an arbitrary shell command, exit 0 = success). - Reflection: failures are turned into a prompt for the critic — any command (
claude -p, curl to a gateway…) that receives the prompt on stdin and returns a YAML patch: add/remove rules and facts, lessons per task. - Candidate: the patch is applied to a copy of the knowledge base; a bad patch physically cannot damage the working base.
- A/B: the suite is run against the candidate. Strictly better → promote (optionally a git commit), otherwise — revert.
Every step is written to a JSONL trace; the trace id becomes the origin of the learned rules.
Eval suite
yaml
name: smoke
tasks:
- id: hello-file
prompt: Create hello.txt containing exactly "hello".
checker: grep -qx "hello" hello.txt
timeoutS: 180The checker is any command: a bash one-liner, a Python script, pytest.
Running
bash
agenix eval -p profile.yaml -s suites/smoke.yaml
agenix learn -p profile.yaml -s suites/smoke.yaml \
--reflector "claude -p" --iterations 3 --git-commitAutonomous evolution
agenix evolve — the full loop with no human in the loop (run it via cron):
bash
agenix evolve -p profile.yaml --suites ./suites \
--reflector "claude -p" \
--generator "claude -p" --sources ./docs,README.md \
--pr --max-suite-runs 30Three phases:
- Trace distillation — failures from production runs, negative feedback (
agenix feedback --outcome bad --note …) and executor disagreements (agenix consensus --profiles a.yaml,b.yaml -m "question") are turned into reflection; a patch is promoted only if all suites stay green (regression gate). - learn on each suite in
--suites. - Auto-curriculum — if everything converged, the generator reads
--sources(product docs/code) and creates a new, harder suite; learn runs on it immediately.
Guardrails: --max-suite-runs (cycle budget), --pr (all promotions go to an evolve/<ts> branch for review, main is untouched), the distillation regression gate. A reflection patch can also add skills (a new procedure), not just rules/facts — provenance is the same.