Benchmarking¶

This directory contains benchmarking documentation and infrastructure notes.

Contents¶

Results - Measurement infrastructure, local benchmarking, and the procedure for producing citable numbers
FAQ - Common questions about performance and threading
Rust benchmarks - Criterion sources in benches/
Python benchmarks - pytest-benchmark suites in tests/python/test_benchmark_*.py

Overview¶

mrrc's performance comes from two design decisions:

Parsing in Rust - record parsing runs as compiled code with no per-record Python interpretation
GIL release during parsing - the Python bindings release the GIL while Rust parses, so multi-threaded workloads can parse in parallel

On a realistic corpus the Python wrapper reads substantially faster than pymarc — both per record and, further, through the parallel batch path — and the native Rust crate is faster still. See Results for the measured three-way figures and how to benchmark mrrc on your own hardware and data.

Benchmark Infrastructure¶

System	Framework	Location
Rust	Criterion	`benches/`
Python	pytest-benchmark	`tests/python/test_benchmark*.py`
CI regression detection	CodSpeed (simulation mode)	`.github/workflows/benchmark-*.yml`

CI uses CodSpeed simulation mode: deterministic instruction-count measurement that detects regressions between commits but does not produce wall-clock throughput numbers. Parallel-throughput benchmarks run locally only — see Results.

Running Benchmarks¶

# Rust benchmarks
cargo bench --bench marc_benchmarks

# Python benchmarks
uv run maturin develop --release
uv run pytest tests/python/ -m "benchmark and not slow" --benchmark-only -v

Test Fixtures¶

Located in tests/data/fixtures/, generated by scripts/generate_benchmark_fixtures.py:

1k_records.mrc (~257 KB) - quick tests
10k_records.mrc (~2.5 MB) - standard benchmarks