Benchmarking¶
This directory contains benchmarking documentation and infrastructure notes.
Contents¶
- Results - Measurement infrastructure, local benchmarking, and the procedure for producing citable numbers
- FAQ - Common questions about performance and threading
- Rust benchmarks - Criterion sources in
benches/ - Python benchmarks - pytest-benchmark suites in
tests/python/test_benchmark_*.py
Related documentation:
- Threading Guide - GIL release strategy and threading patterns
- Performance Tuning - Usage patterns and optimization
Overview¶
mrrc's performance comes from two design decisions:
- Parsing in Rust - record parsing runs as compiled code with no per-record Python interpretation
- GIL release during parsing - the Python bindings release the GIL while Rust parses, so multi-threaded workloads can parse in parallel
Early benchmarking suggested at least a 4x single-threaded speedup over pymarc with the Python wrapper; these benchmarks need to be updated and reconsidered. See Results for what is measured today and how to benchmark mrrc on your own hardware and data.
Benchmark Infrastructure¶
| System | Framework | Location |
|---|---|---|
| Rust | Criterion | benches/ |
| Python | pytest-benchmark | tests/python/test_benchmark*.py |
| CI regression detection | CodSpeed (simulation mode) | .github/workflows/benchmark-*.yml |
CI uses CodSpeed simulation mode: deterministic instruction-count measurement that detects regressions between commits but does not produce wall-clock throughput numbers. Parallel-throughput benchmarks run locally only — see Results.
Running Benchmarks¶
# Rust benchmarks
cargo bench --bench marc_benchmarks
# Python benchmarks
uv run maturin develop --release
uv run pytest tests/python/ -m "benchmark and not slow" --benchmark-only -v
Test Fixtures¶
Located in tests/data/fixtures/, generated by
scripts/generate_benchmark_fixtures.py:
1k_records.mrc(~257 KB) - quick tests10k_records.mrc(~2.5 MB) - standard benchmarks