Skip to content

GIL Release Investigation Addendum

Date: January 1, 2026
Investigation Thread: T-019b7ba5-e7c0-750d-af6c-268f8566b912
Related Plan: @docs/design/GIL_RELEASE_IMPLEMENTATION_PLAN.md
Related Punchlist: @docs/design/GIL_RELEASE_PUNCHLIST.md


Executive Summary

Investigation revealed that while the GIL is being released (confirmed by diagnostic tests), performance remains poor (0.85-0.86x speedup vs 2.0x target) due to error handling code still running inside the allow_threads() closure without the GIL, creating a GIL safety violation that prevents parallelism benefits.

Root Cause: Phase 2 (GIL released) is attempting to create PyErr objects without the GIL, which is illegal and causes the runtime to re-acquire the GIL implicitly, negating the performance benefits.

Solution: Defer all error conversion to Phase 3 (after GIL re-acquisition). The closure must return a custom ParseError enum and convert it to PyErr only after the GIL is re-acquired.


Critical Findings from Investigation

Finding 1: GIL is Actually Released

Evidence: Diagnostic test in test_gil_diagnostic.py shows background thread successfully runs while main thread is in parsing phase.

Implication: The basic GIL release mechanism works. The problem is not the allow_threads() call itself, but what happens inside the closure.

Finding 2: Performance is Blocked by Error Handling

Measured: 2-thread speedup = 0.85x (worse than sequential)
Expected: 2-thread speedup ≥ 2.0x

Root Cause: Phase 2 parsing is attempting to create PyErr inside the allow_threads() closure:

// WRONG - This happens in allow_threads() closure without GIL:
let parsed_record = reader.read_from_bytes(&bytes)
    .map_err(|e| PyErr::new::<PyValueError, _>(e.to_string()))?  // ← GIL VIOLATION

Creating a PyErr without the GIL is a GIL safety violation. The runtime must implicitly re-acquire the GIL to create the exception, which blocks all other threads and negates the parallel benefit.

Finding 3: Three Critical Issues Identified

The investigation of the implementation plan revealed three critical issues that were never fully addressed in Phase B:

Issue #1: Borrow Checker Violation (Phase 2 closure)

  • Description: Original code held a borrow from self.buffer across the allow_threads() boundary
  • Impact: Code won't compile with proper type safety
  • Solution: Use SmallVec<[u8; 4096]> to own the record bytes before Phase 2, breaking the borrow
  • Status: Implemented in Phase A, needs re-verification

Issue #2: Nested Python::attach() Panic (Phase 1 methods)

  • Description: Phase 1 I/O logic was implicitly calling Python::attach() while GIL already held by PyRefMut in __next__()
  • Impact: Would panic at runtime on concurrent calls
  • Solution: Refactor Phase 1 methods to accept &Python and use it directly, never call Python::attach()
  • Status: Partially fixed via Python::assume_gil_acquired(), needs comprehensive review

Issue #3: Phase 2 Error Conversion Outside GIL (CRITICAL BLOCKER)

  • Description: Code attempts to create PyErr inside the allow_threads() closure without the GIL
  • Impact: Violates GIL safety, causes implicit re-acquisition, blocks parallelism benefits
  • Solution: Return custom ParseError enum from closure (GIL-safe), convert to PyErr after GIL re-acquisition
  • Status: NOT FIXED - this is the current blocker for performance

New Beads Created to Address Issues

Priority 1 (Blocking Critical Path)

1. mrrc-18s - BufferedMarcReader SmallVec Verification

  • Relates to: Issue #1 (borrow checker)
  • Purpose: Verify that SmallVec implementation in Phase A correctly solves borrow checker constraints
  • Acceptance: Cargo fmt, clippy, and all tests pass; no borrow checker warnings
  • Blocks: mrrc-69r (must have clean SmallVec before error fix)

2. mrrc-69r - ParseError Wrapper + Defer Error Conversion [★ CRITICAL ★]

  • Relates to: Issue #3 (error handling in Phase 2)
  • Purpose: Implement complete fix for error conversion
  • Create ParseError enum for GIL-free error representation
  • Modify Phase 2 to return Result<MarcRecord, ParseError> instead of PyResult
  • Modify Phase 3 to convert ParseErrorPyErr only after GIL re-acquisition
  • Acceptance: Performance test shows 2-thread speedup ≥ 1.8x after fix
  • Dependencies: mrrc-18s (verify SmallVec is correct first)
  • Blocks: mrrc-kzw and Phase F benchmarks

3. mrrc-kzw - Thread Safety Tests + Concurrency Validation

  • Purpose: Validate that mrrc-69r fix actually works in real threading scenarios
  • Tests:
  • test_threading_concurrent_speedup (2 & 4 threads)
  • test_threading_contention (speedup curve)
  • test_eof_semantics (idempotent behavior)
  • test_exception_propagation (no deadlocks)
  • Expected: 2-thread speedup ≥ 1.8x with mrrc-69r fix
  • Blocks: Final validation before Phase F benchmarks

Priority 2 (Deferred/Conditional)

4. mrrc-5ph - Phase C Optimizations (Batch Reading + Method Caching)

  • Status: CONDITIONAL - only implement if Phase F shows speedup < 2.0x
  • Purpose: Additional optimizations if mrrc-69r alone doesn't achieve 2.0x target
  • Gate: Depends on Phase F benchmark results
  • If speedup ≥ 2.0x: This task becomes optional and is deferred
  • If speedup < 2.0x: This task becomes required before release

5. mrrc-4rm - Document Investigation Findings

  • Purpose: Permanent record of root cause analysis and solutions applied
  • Contents:
  • Root cause analysis (Phase 2 error conversion)
  • Three critical issues identified
  • Diagnostic test results
  • Performance bottleneck explanation
  • Solution roadmap applied
  • Timing: After all fixes validated (mrrc-18s, mrrc-69r, mrrc-kzw)

Critical Path Sequence

Step 1: mrrc-18s (SmallVec verification)
Step 2: mrrc-69r (Error conversion fix) ← BLOCKS ALL PERFORMANCE GAINS
Step 3: mrrc-kzw (Threading tests) + Phase F re-benchmark
Step 4: Conditional mrrc-5ph (only if speedup < 2.0x)
Step 5: mrrc-4rm (Documentation)

Expected Performance Improvement

Before mrrc-69r Fix

  • 2-thread speedup: 0.85x (worse than sequential)
  • Problem: PyErr creation inside allow_threads() closure
  • GIL is released but not usable for parallelism

After mrrc-69r Fix

  • Expected 2-thread speedup: ≥ 1.8x (acceptable), ideally ≥ 2.0x (goal)
  • 4-thread speedup: ≥ 3.2x
  • Mechanism: Deferred error conversion allows threads to run in parallel during Phase 2

After Phase F Re-Benchmarks

  • If speedup ≥ 2.0x: mrrc-5ph becomes optional (Phase C deferred)
  • If speedup < 2.0x: mrrc-5ph becomes required before release

Phase Relationships

GIL Release Epic (mrrc-9wi) [OPEN]
├── Phase A: BufferedMarcReader (mrrc-9wi.1) [CLOSED]
│   └─ Investigation Task: mrrc-18s (verify SmallVec correctness)
├── Phase B: GIL Integration (mrrc-9wi.2) [CLOSED - INCOMPLETE]
│   └─ Investigation Task: mrrc-69r (apply critical fix #3)
├── Phase D: Writer (mrrc-9wi.3) [CLOSED]
├── Phase E: Validation (mrrc-9wi.4) [OPEN]
│   └─ Investigation Task: mrrc-kzw (threading tests validate fix)
├── Phase F: Benchmarks (mrrc-9wi.5) [OPEN]
│   └─ Dependent on: mrrc-69r completion
│   └─ Conditional on: Phase F speedup results for mrrc-5ph decision
└── Phase G: Documentation (mrrc-9wi.6) [OPEN]
    └─ Investigation Task: mrrc-4rm (document findings)

Diagnostic Evidence

Diagnostic Test Results

File: test_gil_diagnostic.py
Result: ✅ GIL IS being released

Proof: Background thread successfully executed while main thread was in parsing phase. This confirms the basic allow_threads() mechanism works.

Performance Test Results

File: test_gil_release_verification.py
Result: ❌ Performance NOT improved

  • Single-thread: baseline
  • 2-thread: 0.85x speedup (should be ≥2.0x)
  • 4-thread: 0.86x speedup (should be ≥3.0x)

Conclusion: GIL is released but parallelism is blocked by error handling overhead.


Implementation Plan Reference

The investigation used information from: - GIL_RELEASE_IMPLEMENTATION_PLAN.md (see sections referenced below) - GIL_RELEASE_PUNCHLIST.md (updated with new tasks)

Key Design Doc References

  • Part 1 (lines 52-66): Three-phase pattern overview
  • Part 2 (lines 88-283): Critical Fixes 1, 2, 3 (all three identified here)
  • Part 5 Phase B (lines 382-412): Phase B delivery requirements
  • Part 10 (lines 1364-1373): Phase C deferral criteria

Next Steps

  1. Immediate: Review and implement mrrc-69r (error conversion fix)
  2. This is the highest-impact, highest-priority fix
  3. Everything else depends on this working

  4. After mrrc-69r: Execute mrrc-kzw (threading tests)

  5. Verify the fix works in real threading scenarios
  6. Confirm performance improvement

  7. After mrrc-kzw: Re-run Phase F benchmarks (mrrc-9wi.5.1)

  8. Measure actual speedup vs Phase B baseline
  9. Decide on mrrc-5ph activation

  10. Final: Complete mrrc-4rm (documentation)

  11. Record findings for future reference
  12. Explain why solution was needed

Questions Answered

Q: Is the GIL actually being released?
A: Yes. Diagnostic tests confirm other threads can run during Phase 2. The issue is not the release mechanism itself.

Q: Why is performance still poor if GIL is released?
A: Error handling code is creating PyErr objects inside the closure without the GIL, forcing implicit re-acquisition.

Q: Can we fix this with optimization?
A: No. The root cause is a GIL safety violation. The fix is architectural: defer error conversion to Phase 3.

Q: Will this fix achieve 2x speedup?
A: Likely yes. If not, Phase C optimizations (mrrc-5ph) are available as a fallback.


  • Previous: Phase B marked complete but performance tests failing
  • Current: Investigation identifies root cause and solutions
  • Follow-up: Three new P1 tasks (mrrc-18s, mrrc-69r, mrrc-kzw) to fix issues
  • Conditional: mrrc-5ph deferred until Phase F results available