Benchmarking¶
This directory contains benchmarking documentation, infrastructure, and results.
Contents¶
- Results - Detailed performance measurements and comparisons
- FAQ - Common questions about performance and threading
- Benchmark Scripts -
benchmark_comparison.pyandcriterion_extractor.py - Rust Benchmarks - Criterion.rs source in
benches/marc_benchmarks.rs
Related documentation: - Threading Guide - GIL release strategy and threading patterns - Performance Tuning - Usage patterns and optimization
Overview¶
mrrc performance is evaluated across three implementations:
- Rust (mrrc) - Pure Rust library (baseline)
- Python (pymrrc) - PyO3-based Python wrapper
- Pure Python (pymarc) - Baseline Python library (for comparison)
Summary¶
Single-threaded performance (default behavior, after warm-up): - Rust: ~1,000,000 rec/s (baseline) - Python wrapper (pymrrc): ~300,000 rec/s (~30% of Rust, ~4x faster than pymarc) - Pure Python (pymarc): ~70,000 rec/s
Multi-threaded performance (explicit opt-in):
- Requires concurrent.futures.ThreadPoolExecutor or ProducerConsumerPipeline
- 2-thread speedup: ~2x vs sequential
- 4-thread speedup: ~3-4x vs sequential
- Each thread needs its own MARCReader instance
- GIL released during parsing in each thread
Methodology: Benchmarks use pytest-benchmark which performs warm-up iterations to stabilize measurements. Cold-start performance is ~20% slower due to JIT/caching effects.
See results.md for detailed measurements and threading-python.md for threading guidance.
Benchmark Infrastructure¶
Test Systems¶
| System | Framework | Location | Notes |
|---|---|---|---|
| Rust | Criterion.rs | benches/marc_benchmarks.rs |
Baseline |
| Python | pytest-benchmark | tests/python/test_benchmark*.py |
PyO3 wrapper (~10-15% overhead) |
| Comparison | Custom script | scripts/benchmark_comparison.py |
Caching + CI-aware |
Running Benchmarks¶
# Rust benchmarks
cargo bench --release
# Python benchmarks
pytest tests/python/ --benchmark-only -v
# Three-way comparison (requires pymarc)
pip install pymarc
python scripts/benchmark_comparison.py
# Check benchmark cache status
python scripts/criterion_extractor.py
# CI-mode
CI=1 python scripts/benchmark_comparison.py
Caching and Staleness Detection¶
The benchmark infrastructure includes:
- Caching: Criterion.rs results parsed from
target/criterion/(~100ms, no recompilation) - Staleness detection: Auto-detects if benchmarks are >24h old or source changed; warns to refresh with
cargo bench --release - CI optimization: Detects CI environment and runs reduced test suite (1k, 10k)
Test Fixtures¶
Located in tests/data/fixtures/:
- 1k_records.mrc (257 KB) - Quick tests
- 10k_records.mrc (2.5 MB) - Standard benchmarks