GIL Release Strategy for pymrrc Threading Performance¶
Status: Design Proposal
Date: January 2026
Related Issue: mrrc-gyk
Goal: Unlock threading parallelism in pymrrc by enabling proper GIL release during I/O operations
Problem Statement¶
Python's Global Interpreter Lock (GIL) prevents concurrent Python threads from executing Python code simultaneously. The current pymrrc implementation holds the GIL during all file I/O operations, which blocks other Python threads from running and eliminates potential parallelism benefits.
Current State¶
Python::attach()holds the GIL during entire I/O flow- Other Python threads are blocked during record reading/writing
- Expected threading speedup (1.41x → 3.4x for concurrent reads) is not achieved
Root Cause¶
The architecture couples I/O operations to Python object management:
- PyFileWrapper holds Py<PyAny> (a Python file object reference)
- PyO3 enforces that Py<T> requires the GIL to be held
- I/O methods (read_record, write_record) directly access this Python reference within their logic
- You cannot release the GIL while holding a reference to a Python object
Approaches Investigated¶
Approach 1: Naive #[pyo3(allow_threads)] Decorator¶
Why it failed: - Compilation errors due to incorrect syntax for PyO3 0.27 - Decorator alone doesn't solve the fundamental problem of holding Python references
Approach 2: py.allow_threads() Wrapper¶
Why it failed:
- The closure captures slf (a PyRefMut), which holds a reference to the Python wrapper object
- allow_threads requires the closure to be Send (thread-safe)
- PyRefMut cannot cross thread boundaries while maintaining the GIL guarantee
- Fundamental blocker: You cannot access Python objects without the GIL held
Approach 3: Python::detach() Pattern¶
Why it failed:
- Same underlying issue as Approach 2
- The closure still captures slf, remaining tied to the GIL
- Detach cannot make Python references Send
Recommended Solution: Intermediate Buffer Pattern¶
Architecture Overview¶
The solution separates I/O logic from Python object management into three distinct phases:
Phase 1 (GIL held) Phase 2 (GIL released) Phase 3 (GIL held)
───────────────── ───────────────────── ─────────────────
Read from Python → Parse/Process Bytes → Return to Python
file object (pure Rust, no refs) file object
Implementation Details¶
Phase 1: Python-Bound I/O (GIL held)¶
Create a thin wrapper method that reads from the Python file object and returns raw bytes:
// In PyFileWrapper
fn read_bytes(&self, py: Python, bytes_to_read: usize) -> PyResult<Vec<u8>> {
// This method MUST hold the GIL because it accesses self.file (Py<PyAny>)
let file_obj = self.file.bind(py);
let read_method = file_obj.getattr("read")?;
let bytes_obj: PyBytes = read_method.call1((bytes_to_read,))?.extract()?;
Ok(bytes_obj.as_bytes().to_vec())
}
Phase 2: Pure Rust Processing (GIL released)¶
The core parsing logic operates on bytes only—no Python references:
// In src-python/src/readers.rs
#[pymethods]
impl PyMarcReader {
fn __next__(&mut self, py: Python) -> PyResult<PyObject> {
// Phase 1: Read bytes (GIL held, fast)
let bytes = self.file_wrapper.read_bytes(py, 65536)?;
// Phase 2: Parse bytes (GIL released, allows other threads)
let record = py.allow_threads(|| {
self.reader.read_record(&bytes)
})?;
// Phase 3: Convert to Python (GIL held)
Ok(record.into_pyobject(py)?)
}
}
Phase 3: Result Conversion (GIL held)¶
Convert processed Rust data back to Python objects while holding the GIL.
Why This Works¶
- Separation of concerns: Python object access is isolated to a thin wrapper layer
- No dangling references: The closure in
allow_threadscaptures only&mut self.reader, which holds no Python references - GIL released during expensive work: CPU-intensive parsing runs without the GIL, allowing other Python threads to execute
- API compatibility: End users don't see internal changes; the API remains pymarc-compatible
- Performance: Threading speedup becomes achievable because I/O doesn't monopolize the GIL
Alternative: Thread Pool Pattern¶
A more complex but potentially higher-throughput approach: - Batch multiple I/O operations - Release GIL once per batch instead of per-record - Better for bulk processing workloads - More complex API and state management - Deferred as secondary optimization
Expected Outcomes¶
With the Intermediate Buffer Pattern: - Threading speedup achieved: Expected 1.41x → 3.4x for concurrent reads (previously blocked) - Rust performance parity: pymrrc threading efficiency matches pure Rust parallelism - Minimal API changes: Transparent to end users - Backward compatible: Existing code continues to work
Implementation Steps¶
- Create
PyFileWrapper::read_bytes()method (Phase 1) - Create
PyFileWrapper::peek_bytes()for record boundary detection (Phase 1) - Refactor
PyMarcReader::__next__()to use three-phase pattern (Phase 2/3) - Refactor
PyMarcReader::read_record()to use three-phase pattern (Phase 2/3) - Refactor
PyMarcWriter::write_record()similarly - Add benchmarking to verify threading speedup
- Verify pymrrc matches Rust parallelism efficiency
Risk Analysis¶
| Risk | Mitigation |
|---|---|
| Increased memory copies (Phase 1 reads into Vec) | Minor: I/O buffer sizes are already large; CPU savings from GIL release far outweigh memory cost |
| Complexity of three-phase pattern | Manageable: Pattern is localized to reader/writer methods |
| Edge cases in byte boundary handling | Covered: Existing record parsing logic already handles byte sequences |
| Binary compatibility | None: This is internal refactoring; API unchanged |
Success Criteria¶
- Threading benchmarks show 2x+ speedup for concurrent operations (currently 1.41x)
- pymrrc threading performance within 90% of pure Rust performance
- All existing tests pass without modification
- No data loss or corruption in record processing
- Backward compatibility maintained for all public APIs
Related Review¶
See GIL_RELEASE_REVIEW.md for detailed technical feedback on this proposal, including: - Critical implementation issues (record boundary detection, borrow checker interactions) - Design improvements and optimization opportunities - Testing recommendations for edge cases