InfoTheory - Benchmarks

External baselines in scope

The current benchmark documentation explicitly tracks two external reference implementations:

PyAIXI, used as a Python baseline for MC-AIXI-style experiments [pyaixi_repo].
The older MC-AIXI C++ codebase, used as a native AC-CTW reference point [mcaixi_cpp_repo].

Those are baselines, not bundled dependencies. The repository keeps them external and drives them through harness scripts so the comparison remains inspectable instead of silently vendored.

1. In-repo AIQI vs MC-AIXI

For quick relative comparisons inside this repository, use:

scripts/bench_aiqi_vs_aixi.sh

This harness is intentionally narrow. It compares the two planners in this repository; it is not a reproduction of A Model-Free Universal AI [kim2026_aiqi] and it is not an external cross-implementation benchmark.

Code references

AIQI vs MC-AIXI harness

2. Compare against PyAIXI

If you already have a PyAIXI clone, run:

scripts/bench_aixi_vs_pyaixi.sh --pyaixi-root /path/to/pyaixi

This script is useful when you want a direct comparison between the current infotheory MC-AIXI implementation and PyAIXI without bringing in the full Guix-pinned multi-implementation harness.

Code references

MC-AIXI vs PyAIXI harness

3. Reproducible multi-implementation runs

For the heavier, more reproducible comparison across:

infotheory Rust CLI,
infotheory Python bindings,
PyAIXI,
and MC-AIXI C++,

use the Guix wrapper:

scripts/bench_aixi_competitors_guix.sh --trials 3 --profile default

The script writes plot-ready TSV files under target/aixi-competitors/<timestamp>/, records the commands used, and pins the external competitor repositories by commit for the run.

Code references

Comparison notes

The runner distinguishes between the directly comparable AC-CTW path and the repository's FAC-CTW variants. That distinction matters: FAC-CTW is informative and often useful, but it is not a like-for-like drop-in for older AC-CTW reference implementations [veness2011_mcaixi].

The benchmark runner also normalizes reward reporting for Kuhn Poker because some older implementations print encoded reward symbols rather than native-domain rewards. That normalization step is part of the comparison logic rather than an after-the-fact spreadsheet fix.

Code references

Competitor benchmark runner