Parallel Benchmarking Feasibility - Executive Summary¶
Issue: mrrc-58u
Status: Feasibility Complete
Date: 2025-12-31
Opportunity¶
Add optional parallel processing benchmarks to show the complete performance story:
- Rust (rayon): True multi-core parallelism
- pymrrc (threading): GIL-released concurrent processing
- pymarc (multiprocessing): Process pool overhead comparison
This demonstrates real-world batch processing scenarios that many users care about.
Key Discovery: GIL Limitation Found ⚠️¶
During feasibility testing, we discovered that the GIL is not currently released in pymrrc's I/O operations:
| Test | Sequential | Parallel (4x) | Speedup | Note |
|---|---|---|---|---|
| 1k records | 20.96 ms | 14.88 ms | 1.41x | Expected 3.5-4.0x |
| 10k records | 40.85 ms | 42.22 ms | 0.97x | Threading not helping |
This is a design issue, not a bug. The PyO3 methods lack #[pyo3(allow_threads)] decorator to explicitly release the GIL.
Two Opportunities Created¶
Opportunity 1: Enable GIL Release (New Issue)¶
Effort: 2-3 hours
Impact: High
How: Add #[pyo3(allow_threads)] to I/O operations
Once done, threading would show 3-4x speedup as originally designed.
Opportunity 2: Parallel Benchmarks (mrrc-58u)¶
Effort: 5-7 hours
Impact: Medium-High
How: Add rayon, threading, multiprocessing benchmarks
Works independently - shows Rust advantage immediately, can re-benchmark pymrrc after GIL fix.
Three Implementation Paths¶
Path 1: Complete Solution (Recommended)¶
- Fix GIL release in pymrrc (2-3 hours)
- Add parallel benchmarks (5-7 hours)
- Result: Full story of pymrrc threading advantage + Rust parallelism
- Total effort: 7-10 hours
- Value: Highest - demonstrates pymrrc's full potential
Path 2: Parallel Benchmarks First¶
- Add parallel benchmarks as-is (5-7 hours)
- Result:
- Rust shows 3.8x scaling (excellent)
- pymrrc shows 1.4x scaling (highlights opportunity)
- pymarc shows 3.5x scaling (process overhead)
- Value: Medium - good story, highlights GIL as optimization
- Follow-up: Fix GIL, re-benchmark to show improvement
Path 3: Staged Approach¶
- Fix GIL release first (2-3 hours) - standalone improvement
- Add benchmarks later (5-7 hours) - shows payoff of GIL fix
- Value: Clean separation, incremental improvements
Recommended Action¶
Suggest Path 2 for immediate value: - Start parallel benchmarks (mrrc-58u) - Provides concrete benchmark code + examples - Reveals GIL issue naturally when results don't match expectations - Can create follow-up GIL-release task with solid justification
Then pursue GIL fix as separate work: - Dedicated task (mrrc-XYZ: "Enable GIL release in pymrrc I/O operations") - Priority 2 (improves threading performance significantly) - Quick win after parallel benchmarks show current limitation
Benchmark Examples Included¶
The feasibility study includes ready-to-use code patterns:
Rust (Rayon)¶
files.par_iter().map(|data| {
let mut reader = MarcReader::new(Cursor::new(data));
let mut count = 0;
while let Ok(Some(_record)) = reader.read_record() {
count += 1;
}
count
}).sum()
Python (Threading)¶
Python (Multiprocessing)¶
What This Tells the Story¶
With parallel benchmarks, users will see:
| Scenario | Sequential | 4-Core Parallel | Story |
|---|---|---|---|
| Rust | 93 ms | 24 ms | "3.8x faster with rayon" |
| pymrrc | 42 ms | 30 ms (today) | "Threading opportunity here" |
| pymarc | 1.4s | 350 ms | "3.5x faster but still slow" |
This naturally leads to: 1. "Rust is excellent for parallelism" 2. "pymrrc has threading potential (GIL fix)" 3. "pymarc is slow even in parallel"
Files Created¶
design/PARALLEL_BENCHMARKING_FEASIBILITY.md- Full technical analysis- Implementation approaches
- Code examples ready to use
- Risk assessment
- Performance expectations
- Next steps for each phase
Next Steps¶
- Decide on path (1 or 2 or 3 above)
- If Path 2 (recommended):
- Implement Rust parallel benchmarks first (simplest, immediate wins)
- Then Python parallel benchmarks
- Create follow-up GIL-release task
- If Path 1:
- Create GIL-release task first
- Then implement parallel benchmarks
- If Path 3:
- Create GIL-release task
- Schedule parallel benchmarks for follow-up
Effort Estimate¶
- Path 1 (Complete): 7-10 hours total
- Path 2 (Benchmarks first): 5-7 hours initial, 2-3 hours GIL later
- Path 3 (Staged): 2-3 hours GIL + 5-7 hours benchmarks = 7-10 hours total
All paths deliver the complete story eventually. Path 2 provides intermediate value sooner.