GIL Release Investigation Addendum¶

Date: January 1, 2026
Investigation Thread: T-019b7ba5-e7c0-750d-af6c-268f8566b912
Related Plan: @docs/design/GIL_RELEASE_IMPLEMENTATION_PLAN.md
Related Punchlist: @docs/design/GIL_RELEASE_PUNCHLIST.md

Executive Summary¶

Investigation revealed that while the GIL is being released (confirmed by diagnostic tests), performance remains poor (0.85-0.86x speedup vs 2.0x target) due to error handling code still running inside the allow_threads() closure without the GIL, creating a GIL safety violation that prevents parallelism benefits.

Root Cause: Phase 2 (GIL released) is attempting to create PyErr objects without the GIL, which is illegal and causes the runtime to re-acquire the GIL implicitly, negating the performance benefits.

Solution: Defer all error conversion to Phase 3 (after GIL re-acquisition). The closure must return a custom ParseError enum and convert it to PyErr only after the GIL is re-acquired.

Critical Findings from Investigation¶

Finding 1: GIL is Actually Released¶

Evidence: Diagnostic test in test_gil_diagnostic.py shows background thread successfully runs while main thread is in parsing phase.

Implication: The basic GIL release mechanism works. The problem is not the allow_threads() call itself, but what happens inside the closure.

Finding 2: Performance is Blocked by Error Handling¶

Measured: 2-thread speedup = 0.85x (worse than sequential)
Expected: 2-thread speedup ≥ 2.0x

Root Cause: Phase 2 parsing is attempting to create PyErr inside the allow_threads() closure:

// WRONG - This happens in allow_threads() closure without GIL:
let parsed_record = reader.read_from_bytes(&bytes)
    .map_err(|e| PyErr::new::<PyValueError, _>(e.to_string()))?  // ← GIL VIOLATION

Creating a PyErr without the GIL is a GIL safety violation. The runtime must implicitly re-acquire the GIL to create the exception, which blocks all other threads and negates the parallel benefit.

Finding 3: Three Critical Issues Identified¶

The investigation of the implementation plan revealed three critical issues that were never fully addressed in Phase B:

Issue #1: Borrow Checker Violation (Phase 2 closure)¶

Description: Original code held a borrow from self.buffer across the allow_threads() boundary
Impact: Code won't compile with proper type safety
Solution: Use SmallVec<[u8; 4096]> to own the record bytes before Phase 2, breaking the borrow
Status: Implemented in Phase A, needs re-verification

Issue #2: Nested Python::attach() Panic (Phase 1 methods)¶

Description: Phase 1 I/O logic was implicitly calling Python::attach() while GIL already held by PyRefMut in __next__()
Impact: Would panic at runtime on concurrent calls
Solution: Refactor Phase 1 methods to accept &Python and use it directly, never call Python::attach()
Status: Partially fixed via Python::assume_gil_acquired(), needs comprehensive review

Issue #3: Phase 2 Error Conversion Outside GIL (CRITICAL BLOCKER)¶

Description: Code attempts to create PyErr inside the allow_threads() closure without the GIL
Impact: Violates GIL safety, causes implicit re-acquisition, blocks parallelism benefits
Solution: Return custom ParseError enum from closure (GIL-safe), convert to PyErr after GIL re-acquisition
Status: NOT FIXED - this is the current blocker for performance

New Beads Created to Address Issues¶

Priority 1 (Blocking Critical Path)¶

1. mrrc-18s - BufferedMarcReader SmallVec Verification¶

Relates to: Issue #1 (borrow checker)
Purpose: Verify that SmallVec implementation in Phase A correctly solves borrow checker constraints
Acceptance: Cargo fmt, clippy, and all tests pass; no borrow checker warnings
Blocks: mrrc-69r (must have clean SmallVec before error fix)

2. mrrc-69r - ParseError Wrapper + Defer Error Conversion [★ CRITICAL ★]¶

Relates to: Issue #3 (error handling in Phase 2)
Purpose: Implement complete fix for error conversion
Create ParseError enum for GIL-free error representation
Modify Phase 2 to return Result<MarcRecord, ParseError> instead of PyResult
Modify Phase 3 to convert ParseError → PyErr only after GIL re-acquisition
Acceptance: Performance test shows 2-thread speedup ≥ 1.8x after fix
Dependencies: mrrc-18s (verify SmallVec is correct first)
Blocks: mrrc-kzw and Phase F benchmarks

3. mrrc-kzw - Thread Safety Tests + Concurrency Validation¶

Purpose: Validate that mrrc-69r fix actually works in real threading scenarios
Tests:
test_threading_concurrent_speedup (2 & 4 threads)
test_threading_contention (speedup curve)
test_eof_semantics (idempotent behavior)
test_exception_propagation (no deadlocks)
Expected: 2-thread speedup ≥ 1.8x with mrrc-69r fix
Blocks: Final validation before Phase F benchmarks

Priority 2 (Deferred/Conditional)¶

4. mrrc-5ph - Phase C Optimizations (Batch Reading + Method Caching)¶

Status: CONDITIONAL - only implement if Phase F shows speedup < 2.0x
Purpose: Additional optimizations if mrrc-69r alone doesn't achieve 2.0x target
Gate: Depends on Phase F benchmark results
If speedup ≥ 2.0x: This task becomes optional and is deferred
If speedup < 2.0x: This task becomes required before release

5. mrrc-4rm - Document Investigation Findings¶

Purpose: Permanent record of root cause analysis and solutions applied
Contents:
Root cause analysis (Phase 2 error conversion)
Three critical issues identified
Diagnostic test results
Performance bottleneck explanation
Solution roadmap applied
Timing: After all fixes validated (mrrc-18s, mrrc-69r, mrrc-kzw)

Critical Path Sequence¶

Step 1: mrrc-18s (SmallVec verification)
          ↓
Step 2: mrrc-69r (Error conversion fix) ← BLOCKS ALL PERFORMANCE GAINS
          ↓
Step 3: mrrc-kzw (Threading tests) + Phase F re-benchmark
          ↓
Step 4: Conditional mrrc-5ph (only if speedup < 2.0x)
          ↓
Step 5: mrrc-4rm (Documentation)

Expected Performance Improvement¶

Before mrrc-69r Fix¶

2-thread speedup: 0.85x (worse than sequential)
Problem: PyErr creation inside allow_threads() closure
GIL is released but not usable for parallelism

After mrrc-69r Fix¶

Expected 2-thread speedup: ≥ 1.8x (acceptable), ideally ≥ 2.0x (goal)
4-thread speedup: ≥ 3.2x
Mechanism: Deferred error conversion allows threads to run in parallel during Phase 2

After Phase F Re-Benchmarks¶

If speedup ≥ 2.0x: mrrc-5ph becomes optional (Phase C deferred)
If speedup < 2.0x: mrrc-5ph becomes required before release

Phase Relationships¶

GIL Release Epic (mrrc-9wi) [OPEN]
├── Phase A: BufferedMarcReader (mrrc-9wi.1) [CLOSED]
│   └─ Investigation Task: mrrc-18s (verify SmallVec correctness)
│
├── Phase B: GIL Integration (mrrc-9wi.2) [CLOSED - INCOMPLETE]
│   └─ Investigation Task: mrrc-69r (apply critical fix #3)
│
├── Phase D: Writer (mrrc-9wi.3) [CLOSED]
│
├── Phase E: Validation (mrrc-9wi.4) [OPEN]
│   └─ Investigation Task: mrrc-kzw (threading tests validate fix)
│
├── Phase F: Benchmarks (mrrc-9wi.5) [OPEN]
│   └─ Dependent on: mrrc-69r completion
│   └─ Conditional on: Phase F speedup results for mrrc-5ph decision
│
└── Phase G: Documentation (mrrc-9wi.6) [OPEN]
    └─ Investigation Task: mrrc-4rm (document findings)

Diagnostic Evidence¶

Diagnostic Test Results¶

File: test_gil_diagnostic.py
Result: ✅ GIL IS being released

Proof: Background thread successfully executed while main thread was in parsing phase. This confirms the basic allow_threads() mechanism works.

Performance Test Results¶

File: test_gil_release_verification.py
Result: ❌ Performance NOT improved

Single-thread: baseline
2-thread: 0.85x speedup (should be ≥2.0x)
4-thread: 0.86x speedup (should be ≥3.0x)

Conclusion: GIL is released but parallelism is blocked by error handling overhead.

Implementation Plan Reference¶

The investigation used information from: - GIL_RELEASE_IMPLEMENTATION_PLAN.md (see sections referenced below) - GIL_RELEASE_PUNCHLIST.md (updated with new tasks)

Key Design Doc References¶

Part 1 (lines 52-66): Three-phase pattern overview
Part 2 (lines 88-283): Critical Fixes 1, 2, 3 (all three identified here)
Part 5 Phase B (lines 382-412): Phase B delivery requirements
Part 10 (lines 1364-1373): Phase C deferral criteria

Next Steps¶

Immediate: Review and implement mrrc-69r (error conversion fix)
This is the highest-impact, highest-priority fix
Everything else depends on this working
After mrrc-69r: Execute mrrc-kzw (threading tests)
Verify the fix works in real threading scenarios
Confirm performance improvement
After mrrc-kzw: Re-run Phase F benchmarks (mrrc-9wi.5.1)
Measure actual speedup vs Phase B baseline
Decide on mrrc-5ph activation
Final: Complete mrrc-4rm (documentation)
Record findings for future reference
Explain why solution was needed

Questions Answered¶

Q: Is the GIL actually being released?
A: Yes. Diagnostic tests confirm other threads can run during Phase 2. The issue is not the release mechanism itself.

Q: Why is performance still poor if GIL is released?
A: Error handling code is creating PyErr objects inside the closure without the GIL, forcing implicit re-acquisition.

Q: Can we fix this with optimization?
A: No. The root cause is a GIL safety violation. The fix is architectural: defer error conversion to Phase 3.

Q: Will this fix achieve 2x speedup?
A: Likely yes. If not, Phase C optimizations (mrrc-5ph) are available as a fallback.

Previous: Phase B marked complete but performance tests failing
Current: Investigation identifies root cause and solutions
Follow-up: Three new P1 tasks (mrrc-18s, mrrc-69r, mrrc-kzw) to fix issues
Conditional: mrrc-5ph deferred until Phase F results available