Skip to content

Benchmarking

This directory contains benchmarking documentation, infrastructure, and results.

Contents

  • Results - Detailed performance measurements and comparisons
  • FAQ - Common questions about performance and threading
  • Benchmark Scripts - benchmark_comparison.py and criterion_extractor.py
  • Rust Benchmarks - Criterion.rs source in benches/marc_benchmarks.rs

Related documentation: - Threading Guide - GIL release strategy and threading patterns - Performance Tuning - Usage patterns and optimization

Overview

mrrc performance is evaluated across three implementations:

  1. Rust (mrrc) - Pure Rust library (baseline)
  2. Python (pymrrc) - PyO3-based Python wrapper
  3. Pure Python (pymarc) - Baseline Python library (for comparison)

Summary

Single-threaded performance (default behavior, after warm-up): - Rust: ~1,000,000 rec/s (baseline) - Python wrapper (pymrrc): ~300,000 rec/s (~30% of Rust, ~4x faster than pymarc) - Pure Python (pymarc): ~70,000 rec/s

Multi-threaded performance (explicit opt-in): - Requires concurrent.futures.ThreadPoolExecutor or ProducerConsumerPipeline - 2-thread speedup: ~2x vs sequential - 4-thread speedup: ~3-4x vs sequential - Each thread needs its own MARCReader instance - GIL released during parsing in each thread

Methodology: Benchmarks use pytest-benchmark which performs warm-up iterations to stabilize measurements. Cold-start performance is ~20% slower due to JIT/caching effects.

See results.md for detailed measurements and threading-python.md for threading guidance.

Benchmark Infrastructure

Test Systems

System Framework Location Notes
Rust Criterion.rs benches/marc_benchmarks.rs Baseline
Python pytest-benchmark tests/python/test_benchmark*.py PyO3 wrapper (~10-15% overhead)
Comparison Custom script scripts/benchmark_comparison.py Caching + CI-aware

Running Benchmarks

# Rust benchmarks
cargo bench --release

# Python benchmarks
pytest tests/python/ --benchmark-only -v

# Three-way comparison (requires pymarc)
pip install pymarc
python scripts/benchmark_comparison.py

# Check benchmark cache status
python scripts/criterion_extractor.py

# CI-mode
CI=1 python scripts/benchmark_comparison.py

Caching and Staleness Detection

The benchmark infrastructure includes:

  • Caching: Criterion.rs results parsed from target/criterion/ (~100ms, no recompilation)
  • Staleness detection: Auto-detects if benchmarks are >24h old or source changed; warns to refresh with cargo bench --release
  • CI optimization: Detects CI environment and runs reduced test suite (1k, 10k)

Test Fixtures

Located in tests/data/fixtures/: - 1k_records.mrc (257 KB) - Quick tests - 10k_records.mrc (2.5 MB) - Standard benchmarks