Benchmark Results¶
Early benchmarking (January 2026, on an Apple M4 MacBook Air) suggested at least a 4x single-threaded speedup over pymarc with the Python wrapper, with the pure Rust library reading on the order of 1M records/second. Those benchmarks need to be updated and reconsidered: the comparison harness they used is no longer in the repository, the parser has been substantially optimized since, and the pymarc version measured against was not recorded. Treat the multipliers as historical indications, not current measurements.
This page describes what the benchmark infrastructure measures today and the procedure for producing numbers worth citing.
What CI measures¶
Two CodSpeed jobs run on every pull request, both in simulation mode:
- Rust: criterion benches (
benches/marc_benchmarks.rs,benches/error_handling_benchmarks.rs) via.github/workflows/benchmark-rust.yml - Python: pytest benchmarks (
tests/python/test_benchmark_reading.py,test_benchmark_writing.py,test_memory_benchmarks.py) via.github/workflows/benchmark-python.yml
Simulation mode executes each benchmark once under Valgrind and models its cost from instruction counts and cache behavior. The result is deterministic: the same code produces the same number regardless of runner speed, which makes it reliable for detecting regressions between commits. It is not wall-clock time — simulation results cannot be quoted as records/second.
Parallel-throughput benchmarks are excluded from CI because Valgrind serializes threads, so multi-threaded speedup cannot be measured under simulation. Measure parallelism locally instead (below).
Measuring locally¶
Local runs use real wall-clock time. For stable numbers: run on AC power, on a quiet machine, and let the frameworks' warm-up and repeated rounds do their work.
Single-threaded¶
# Rust (criterion)
cargo bench --bench marc_benchmarks
cargo bench --bench error_handling_benchmarks
# Python (pytest-benchmark)
uv run maturin develop --release
uv run pytest tests/python/ -m "benchmark and not slow" --benchmark-only -v
Parallel throughput¶
# Rust (criterion, rayon)
cargo bench --bench parallel_benchmarks
# Python (ThreadPoolExecutor and ProducerConsumerPipeline)
uv run pytest tests/python/test_benchmark_parallel.py \
tests/python/test_benchmark_pipeline_parallel.py --benchmark-only -v
Producing a citable comparison¶
Any published figure — especially a comparison against pymarc — must come from a run that records:
- the date of the run
- hardware: CPU model, core count, memory
- OS name and version
- Rust toolchain version and Python version
- the exact, pinned version of every library measured (including pymarc)
- the harness used, committed to this repository
- the fixture data and its size
A multiplier without this context is not reproducible and does not belong in the documentation.
Test fixtures¶
Benchmark fixtures are synthetic MARC records generated by
scripts/generate_benchmark_fixtures.py, stored in tests/data/fixtures/:
1k_records.mrc(~257 KB) — quick tests10k_records.mrc(~2.5 MB) — standard benchmarks
Synthetic fixtures are adequate for regression detection. Figures intended to describe real-world performance should also be measured against representative library data.
References¶
- Rust benchmarks:
benches/marc_benchmarks.rs,benches/error_handling_benchmarks.rs,benches/parallel_benchmarks.rs - Python benchmarks:
tests/python/test_benchmark_*.py - Memory benchmarks:
tests/python/test_memory_benchmarks.py - Fixture generator:
scripts/generate_benchmark_fixtures.py - CI workflows:
.github/workflows/benchmark-rust.yml,.github/workflows/benchmark-python.yml