Benchmarks
How to reproduce internal and external AIXI comparisons.
External baselines in scope
The current benchmark documentation explicitly tracks two external reference implementations:
- PyAIXI, used as a Python baseline for MC-AIXI-style experiments [pyaixi_repo].
- The older MC-AIXI C++ codebase, used as a native AC-CTW reference point [mcaixi_cpp_repo].
Those are baselines, not bundled dependencies. The repository keeps them external and drives them through harness scripts so the comparison remains inspectable instead of silently vendored.
1. In-repo AIQI vs MC-AIXI
For quick relative comparisons inside this repository, use:
scripts/bench_aiqi_vs_aixi.sh
This harness is intentionally narrow. It is for comparing the two planner paths in the current repo, not for reproducing the AIQI paper against external implementations.
Code references
2. Compare against PyAIXI
If you already have a PyAIXI clone, run:
scripts/bench_aixi_vs_pyaixi.sh --pyaixi-root /path/to/pyaixi
This path is useful when you want a direct comparison between the current infotheory MC-AIXI stack and the Python reference implementation without bringing in the full Guix-pinned multi-implementation harness.
Code references
3. Reproducible multi-implementation runs
For the heavier, more reproducible comparison across:
infotheoryRust CLI,infotheoryPython bindings,- PyAIXI,
- and MC-AIXI C++,
use the Guix wrapper:
scripts/bench_aixi_competitors_guix.sh --trials 3 --profile default
The script writes plot-ready TSV files under target/aixi-competitors/<timestamp>/, records the commands used, and pins the external competitor repositories by commit for the run.
Comparison notes
The runner distinguishes between the directly comparable AC-CTW path and the repository's FAC-CTW variants. That distinction matters: FAC-CTW is informative and often useful, but it is not a like-for-like drop-in for older AC-CTW reference implementations [veness2011_mcaixi].
The benchmark runner also normalizes reward reporting for Kuhn Poker because some older implementations print encoded reward symbols rather than native-domain rewards. That normalization step is part of the comparison logic rather than an after-the-fact spreadsheet fix.