Parallel Benchmarking Feasibility - Executive Summary¶

Issue: mrrc-58u
Status: Feasibility Complete
Date: 2025-12-31

Opportunity¶

Add optional parallel processing benchmarks to show the complete performance story: - Rust (rayon): True multi-core parallelism - pymrrc (threading): GIL-released concurrent processing
- pymarc (multiprocessing): Process pool overhead comparison

This demonstrates real-world batch processing scenarios that many users care about.

Key Discovery: GIL Limitation Found ⚠️¶

During feasibility testing, we discovered that the GIL is not currently released in pymrrc's I/O operations:

Test	Sequential	Parallel (4x)	Speedup	Note
1k records	20.96 ms	14.88 ms	1.41x	Expected 3.5-4.0x
10k records	40.85 ms	42.22 ms	0.97x	Threading not helping

This is a design issue, not a bug. The PyO3 methods lack #[pyo3(allow_threads)] decorator to explicitly release the GIL.

Two Opportunities Created¶

Opportunity 1: Enable GIL Release (New Issue)¶

Effort: 2-3 hours
Impact: High
How: Add #[pyo3(allow_threads)] to I/O operations

Once done, threading would show 3-4x speedup as originally designed.

Opportunity 2: Parallel Benchmarks (mrrc-58u)¶

Effort: 5-7 hours
Impact: Medium-High
How: Add rayon, threading, multiprocessing benchmarks

Works independently - shows Rust advantage immediately, can re-benchmark pymrrc after GIL fix.

Three Implementation Paths¶

Path 1: Complete Solution (Recommended)¶

Fix GIL release in pymrrc (2-3 hours)
Add parallel benchmarks (5-7 hours)
Result: Full story of pymrrc threading advantage + Rust parallelism
Total effort: 7-10 hours
Value: Highest - demonstrates pymrrc's full potential

Path 2: Parallel Benchmarks First¶

Add parallel benchmarks as-is (5-7 hours)
Result:
Rust shows 3.8x scaling (excellent)
pymrrc shows 1.4x scaling (highlights opportunity)
pymarc shows 3.5x scaling (process overhead)
Value: Medium - good story, highlights GIL as optimization
Follow-up: Fix GIL, re-benchmark to show improvement

Path 3: Staged Approach¶

Fix GIL release first (2-3 hours) - standalone improvement
Add benchmarks later (5-7 hours) - shows payoff of GIL fix
Value: Clean separation, incremental improvements

Recommended Action¶

Suggest Path 2 for immediate value: - Start parallel benchmarks (mrrc-58u) - Provides concrete benchmark code + examples - Reveals GIL issue naturally when results don't match expectations - Can create follow-up GIL-release task with solid justification

Then pursue GIL fix as separate work: - Dedicated task (mrrc-XYZ: "Enable GIL release in pymrrc I/O operations") - Priority 2 (improves threading performance significantly) - Quick win after parallel benchmarks show current limitation

Benchmark Examples Included¶

The feasibility study includes ready-to-use code patterns:

Rust (Rayon)¶

files.par_iter().map(|data| {
    let mut reader = MarcReader::new(Cursor::new(data));
    let mut count = 0;
    while let Ok(Some(_record)) = reader.read_record() {
        count += 1;
    }
    count
}).sum()

Python (Threading)¶

with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(read_file, files))

Python (Multiprocessing)¶

with Pool(processes=4) as pool:
    results = pool.map(read_pymarc_file, files)

What This Tells the Story¶

With parallel benchmarks, users will see:

Scenario	Sequential	4-Core Parallel	Story
Rust	93 ms	24 ms	"3.8x faster with rayon"
pymrrc	42 ms	30 ms (today)	"Threading opportunity here"
pymarc	1.4s	350 ms	"3.5x faster but still slow"

This naturally leads to: 1. "Rust is excellent for parallelism" 2. "pymrrc has threading potential (GIL fix)" 3. "pymarc is slow even in parallel"

Files Created¶

design/PARALLEL_BENCHMARKING_FEASIBILITY.md - Full technical analysis
Implementation approaches
Code examples ready to use
Risk assessment
Performance expectations
Next steps for each phase

Next Steps¶

Decide on path (1 or 2 or 3 above)
If Path 2 (recommended):
Implement Rust parallel benchmarks first (simplest, immediate wins)
Then Python parallel benchmarks
Create follow-up GIL-release task
If Path 1:
Create GIL-release task first
Then implement parallel benchmarks
If Path 3:
Create GIL-release task
Schedule parallel benchmarks for follow-up

Effort Estimate¶

Path 1 (Complete): 7-10 hours total
Path 2 (Benchmarks first): 5-7 hours initial, 2-3 hours GIL later
Path 3 (Staged): 2-3 hours GIL + 5-7 hours benchmarks = 7-10 hours total

All paths deliver the complete story eventually. Path 2 provides intermediate value sooner.