Skip to content

Benchmarking

This directory contains benchmarking documentation and infrastructure notes.

Contents

  • Results - Measurement infrastructure, local benchmarking, and the procedure for producing citable numbers
  • FAQ - Common questions about performance and threading
  • Rust benchmarks - Criterion sources in benches/
  • Python benchmarks - pytest-benchmark suites in tests/python/test_benchmark_*.py

Related documentation:

Overview

mrrc's performance comes from two design decisions:

  1. Parsing in Rust - record parsing runs as compiled code with no per-record Python interpretation
  2. GIL release during parsing - the Python bindings release the GIL while Rust parses, so multi-threaded workloads can parse in parallel

Early benchmarking suggested at least a 4x single-threaded speedup over pymarc with the Python wrapper; these benchmarks need to be updated and reconsidered. See Results for what is measured today and how to benchmark mrrc on your own hardware and data.

Benchmark Infrastructure

System Framework Location
Rust Criterion benches/
Python pytest-benchmark tests/python/test_benchmark*.py
CI regression detection CodSpeed (simulation mode) .github/workflows/benchmark-*.yml

CI uses CodSpeed simulation mode: deterministic instruction-count measurement that detects regressions between commits but does not produce wall-clock throughput numbers. Parallel-throughput benchmarks run locally only — see Results.

Running Benchmarks

# Rust benchmarks
cargo bench --bench marc_benchmarks

# Python benchmarks
uv run maturin develop --release
uv run pytest tests/python/ -m "benchmark and not slow" --benchmark-only -v

Test Fixtures

Located in tests/data/fixtures/, generated by scripts/generate_benchmark_fixtures.py:

  • 1k_records.mrc (~257 KB) - quick tests
  • 10k_records.mrc (~2.5 MB) - standard benchmarks