Skip to content

Parallel Benchmarking Feasibility - Executive Summary

Issue: mrrc-58u
Status: Feasibility Complete
Date: 2025-12-31

Opportunity

Add optional parallel processing benchmarks to show the complete performance story: - Rust (rayon): True multi-core parallelism - pymrrc (threading): GIL-released concurrent processing
- pymarc (multiprocessing): Process pool overhead comparison

This demonstrates real-world batch processing scenarios that many users care about.

Key Discovery: GIL Limitation Found ⚠️

During feasibility testing, we discovered that the GIL is not currently released in pymrrc's I/O operations:

Test Sequential Parallel (4x) Speedup Note
1k records 20.96 ms 14.88 ms 1.41x Expected 3.5-4.0x
10k records 40.85 ms 42.22 ms 0.97x Threading not helping

This is a design issue, not a bug. The PyO3 methods lack #[pyo3(allow_threads)] decorator to explicitly release the GIL.

Two Opportunities Created

Opportunity 1: Enable GIL Release (New Issue)

Effort: 2-3 hours
Impact: High
How: Add #[pyo3(allow_threads)] to I/O operations

Once done, threading would show 3-4x speedup as originally designed.

Opportunity 2: Parallel Benchmarks (mrrc-58u)

Effort: 5-7 hours
Impact: Medium-High
How: Add rayon, threading, multiprocessing benchmarks

Works independently - shows Rust advantage immediately, can re-benchmark pymrrc after GIL fix.

Three Implementation Paths

  1. Fix GIL release in pymrrc (2-3 hours)
  2. Add parallel benchmarks (5-7 hours)
  3. Result: Full story of pymrrc threading advantage + Rust parallelism
  4. Total effort: 7-10 hours
  5. Value: Highest - demonstrates pymrrc's full potential

Path 2: Parallel Benchmarks First

  1. Add parallel benchmarks as-is (5-7 hours)
  2. Result:
  3. Rust shows 3.8x scaling (excellent)
  4. pymrrc shows 1.4x scaling (highlights opportunity)
  5. pymarc shows 3.5x scaling (process overhead)
  6. Value: Medium - good story, highlights GIL as optimization
  7. Follow-up: Fix GIL, re-benchmark to show improvement

Path 3: Staged Approach

  1. Fix GIL release first (2-3 hours) - standalone improvement
  2. Add benchmarks later (5-7 hours) - shows payoff of GIL fix
  3. Value: Clean separation, incremental improvements

Suggest Path 2 for immediate value: - Start parallel benchmarks (mrrc-58u) - Provides concrete benchmark code + examples - Reveals GIL issue naturally when results don't match expectations - Can create follow-up GIL-release task with solid justification

Then pursue GIL fix as separate work: - Dedicated task (mrrc-XYZ: "Enable GIL release in pymrrc I/O operations") - Priority 2 (improves threading performance significantly) - Quick win after parallel benchmarks show current limitation

Benchmark Examples Included

The feasibility study includes ready-to-use code patterns:

Rust (Rayon)

files.par_iter().map(|data| {
    let mut reader = MarcReader::new(Cursor::new(data));
    let mut count = 0;
    while let Ok(Some(_record)) = reader.read_record() {
        count += 1;
    }
    count
}).sum()

Python (Threading)

with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(read_file, files))

Python (Multiprocessing)

with Pool(processes=4) as pool:
    results = pool.map(read_pymarc_file, files)

What This Tells the Story

With parallel benchmarks, users will see:

Scenario Sequential 4-Core Parallel Story
Rust 93 ms 24 ms "3.8x faster with rayon"
pymrrc 42 ms 30 ms (today) "Threading opportunity here"
pymarc 1.4s 350 ms "3.5x faster but still slow"

This naturally leads to: 1. "Rust is excellent for parallelism" 2. "pymrrc has threading potential (GIL fix)" 3. "pymarc is slow even in parallel"

Files Created

  • design/PARALLEL_BENCHMARKING_FEASIBILITY.md - Full technical analysis
  • Implementation approaches
  • Code examples ready to use
  • Risk assessment
  • Performance expectations
  • Next steps for each phase

Next Steps

  1. Decide on path (1 or 2 or 3 above)
  2. If Path 2 (recommended):
  3. Implement Rust parallel benchmarks first (simplest, immediate wins)
  4. Then Python parallel benchmarks
  5. Create follow-up GIL-release task
  6. If Path 1:
  7. Create GIL-release task first
  8. Then implement parallel benchmarks
  9. If Path 3:
  10. Create GIL-release task
  11. Schedule parallel benchmarks for follow-up

Effort Estimate

  • Path 1 (Complete): 7-10 hours total
  • Path 2 (Benchmarks first): 5-7 hours initial, 2-3 hours GIL later
  • Path 3 (Staged): 2-3 hours GIL + 5-7 hours benchmarks = 7-10 hours total

All paths deliver the complete story eventually. Path 2 provides intermediate value sooner.