Hybrid Implementation Plan - Beads Mapping & Revisions¶

Date: January 2, 2026
Status: Ready for Beads Integration
Purpose: Cross-reference detailed implementation plan against existing bd (beads) epics/tasks and identify gaps for creation

Executive Summary¶

The GIL_RELEASE_HYBRID_IMPLEMENTATION_PLAN_REVISIONS.md document specifies two major phases: - Phase C: Batch Reading (Compatibility Path) — Amortize GIL acquire/release overhead - Phase H: Pure Rust I/O & Rayon Parallelism (Performance Path) — Full parallelism with RustFile backend

Current Beads Status: - Phase C epics exist (mrrc-ppp, mrrc-d0m) but lack detailed subtasks - Phase H epic does not exist — must be created - Related infrastructure (diagnostic tests, benchmarks) is scattered or missing - Phase E (Validation), Phase F (Benchmarks), Phase G (Docs) exist but lack Phase H context

Action Required: 1. Delete duplicate Phase C epic (mrrc-d0m) 2. Create Phase H epic with full task breakdown 3. Populate Phase C with detailed subtasks (C.0 through C.Gate) 4. Create diagnostic/infrastructure tasks (GIL verification tests, batch size benchmarking utilities) 5. Update Phase E/F/G to explicitly depend on Phase C completion

1. Detailed Beads Mapping¶

1.1 Phase C: Batch Reading (Compatibility Path)¶

Plan Reference: §3 (lines 48-243)
Existing Beads: - mrrc-ppp (Epic) - Title correct; no subtasks - mrrc-d0m (Epic) - Duplicate; recommend deletion - mrrc-br3 (Task) - "GIL Release: Retest Phase B" - Related but not Phase C subtask - mrrc-18s (Task) - "GIL Release: Implement BufferedMarcReader with SmallVec" - Phase A task, prerequisite to C - mrrc-kzw (Task) - "GIL Release: Add thread safety tests" - Should be Phase C.Gate - mrrc-5ph (Task) - "GIL Release: Optimize batch reading" - Partial Phase C task

Gaps Identified: | Task ID | Plan §6 | Title | Status | Gap | |---------|---------|-------|--------|-----| | MISSING | C.0 | Data Structure, State Machine, Diagnostic Tests | - | Critical: Foundation for all C subtasks | | MISSING | C.1 | Implement read_batch() Method | - | Critical: Core reading batching | | MISSING | C.2 | Update next() to Use Queue | - | Critical: FSM integration | | MISSING | C.3 | Iterator Semantics & Idempotence | - | High: EOF behavior | | MISSING | C.4 | Memory Profiling | - | Medium: Memory safety gate | | MISSING | C.Gate | Benchmark Batch Sizes (10-500 sweep) | - | Critical: Performance gate (≥1.8x speedup target) |

Required Actions:

Create Phase C subtasks under mrrc-ppp:

# C.0: Foundation (prerequisite to C.1)
bd create "C.0: Data Structure, State Machine & GIL Verification Test" \
  --parent mrrc-ppp -t task -p 1 \
  --deps discovered-from:mrrc-18s \
  --json

# Returned ID: <C0-id>
# C.0 Acceptance: 
# - PyMarcReaderState struct defined with VecDeque<SmallVec<[u8; 4096]>>
# - EOF state machine (3-state logic) implemented and code-reviewed
# - test_gil_release_verification.rs created (threaded reader, measures wall-clock vs sequential)
# - Scripts: benchmark_batch_sizes.py, test_python_file_overhead.py created in scripts/

# C.1: read_batch() Implementation (depends on C.0)
bd create "C.1: Implement read_batch() Method with Single GIL Cycle" \
  --parent mrrc-ppp -t task -p 1 \
  --deps discovered-from:<C0-id> \
  --json

# C.2: Queue FSM Integration (depends on C.1)
bd create "C.2: Update __next__() to Use Queue & EOF State Machine" \
  --parent mrrc-ppp -t task -p 1 \
  --deps discovered-from:<C1-id> \
  --json

# C.3: Iterator Semantics (depends on C.2)
bd create "C.3: Iterator Semantics & Idempotence Verification" \
  --parent mrrc-ppp -t task -p 1 \
  --deps discovered-from:<C2-id> \
  --json

# C.4: Memory Profiling (parallel with C.2-C.3)
bd create "C.4: Memory Profiling & Bounds Validation" \
  --parent mrrc-ppp -t task -p 2 \
  --json

# C.Gate: Benchmarking Gate (depends on C.3, C.4)
bd create "C.Gate: Benchmark Batch Sizes (10-500 sweep) - ≥1.8x Speedup Gate" \
  --parent mrrc-ppp -t task -p 1 \
  --deps discovered-from:<C3-id>,discovered-from:<C4-id> \
  --json

Phase C Dependency Graph:

mrrc-18s (SmallVec impl)
    ↓
C.0 (State machine + diagnostics)
    ↓
C.1 (read_batch)
    ↓
C.2 (Queue FSM)
    ↓
C.3 (Idempotence)    C.4 (Memory profiling)
    ↓                 ↓
    └─────→ C.Gate ←─┘
            (≥1.8x speedup)

1.2 Phase H: Pure Rust I/O & Rayon Parallelism (Performance Path)¶

Plan Reference: §4 & §6 (lines 246-645)
Existing Beads: NONE

Status: Phase H epic and all subtasks must be created

Critical Path Blockers: - C.Gate must pass (≥1.8x speedup) before proceeding to H.3-H.4 - H.0 (Rayon PoC) can run in parallel with Phase C (low dependency) - H.3 sequential baseline requires C.Gate completion for accurate measurement context

Recommended Creation Order:

# Phase H Epic (depends on C.Gate)
bd create "Phase H: Pure Rust I/O & Rayon Parallelism" \
  -t epic -p 1 \
  --deps discovered-from:<C-Gate-id> \
  --json
# Returned ID: <H-epic-id>

# H.0: Rayon PoC (can start immediately; independent)
bd create "H.0: Rayon PoC - Thread Pool & Channel Pipeline Validation" \
  --parent <H-epic-id> -t task -p 1 \
  --json
# Success: >1.5x speedup, no panics, <5% overhead

# H.1: Type Detection (depends on H.0 completing or parallel)
bd create "H.1: ReaderBackend Enum & Type Detection Algorithm" \
  --parent <H-epic-id> -t task -p 1 \
  --deps discovered-from:<H0-id> \
  --json

# H.2: RustFile Backend (depends on H.1)
bd create "H.2: RustFile Backend Implementation - Sequential Read" \
  --parent <H-epic-id> -t task -p 1 \
  --deps discovered-from:<H1-id> \
  --json

# H.2b: CursorBackend (depends on H.1)
bd create "H.2b: CursorBackend Implementation - Memory-Mapped & Bytes" \
  --parent <H-epic-id> -t task -p 1 \
  --deps discovered-from:<H1-id> \
  --json

# H.3: Sequential Baseline (depends on C.Gate, H.2, H.2b)
bd create "H.3: Sequential Baseline & Parity Tests (RustFile + CursorBackend)" \
  --parent <H-epic-id> -t task -p 1 \
  --deps discovered-from:<C-Gate-id>,discovered-from:<H2-id>,discovered-from:<H2b-id> \
  --json

# H.4a: Boundary Scanner (parallel with H.3)
bd create "H.4a: Record Boundary Scanner (0x1E delimiter + multi-threaded)" \
  --parent <H-epic-id> -t task -p 1 \
  --json

# H.4b: Rayon Parser Pool (depends on H.4a)
bd create "H.4b: Rayon Parser Pool - Parallel Batch Processing" \
  --parent <H-epic-id> -t task -p 1 \
  --deps discovered-from:<H4a-id> \
  --json

# H.4c: Producer-Consumer Pipeline (depends on H.4b)
bd create "H.4c: Producer-Consumer Pipeline - Backpressure & Channels" \
  --parent <H-epic-id> -t task -p 1 \
  --deps discovered-from:<H4b-id> \
  --json

# H.5: Integration & Error Handling (depends on H.4c)
bd create "H.5: Integration Tests & Error Propagation Validation" \
  --parent <H-epic-id> -t task -p 1 \
  --deps discovered-from:<H4c-id> \
  --json

# H.Gate: Parallel Benchmarking Gate (depends on H.5)
bd create "H.Gate: Parallel Benchmarking - ≥2.5x Speedup (4-thread) Gate" \
  --parent <H-epic-id> -t task -p 1 \
  --deps discovered-from:<H5-id> \
  --json

Phase H Dependency Graph:

 H.0  H.1  H.2 (RustFile)     ↓

name="__codelineno-3-1" href="#__codelineno-3-1">C.Gate (batch reading ≥1.8x) ↓ (Rayon PoC) ↓ (Type detection) ↓ H.2b (CursorBackend) ↓ └─────→ H.3 ←─┘ (Sequential baseline) ↓ H.4a (Boundary scanner) ↓ H.4b (Rayon parser pool) ↓ H.4c (Producer-consumer pipeline) ↓ H.5 (Integration) ↓ H.Gate (≥2.5x speedup)

1.3 Diagnostic & Infrastructure Tasks¶

Plan Reference: §7 (lines 850-1045)

Existing Beads: - None dedicated to diagnostic infrastructure

Required Creation (support tasks for C.0 & C.Gate):

# Diagnostic Test Suite Setup (C.0 dependency)
bd create "Infrastructure: Diagnostic Test Suite (GIL verification, benchmarking utilities)" \
  -t task -p 1 \
  --json

# This task encompasses:
# 1. tests/concurrent_gil_tests.rs - test_gil_release_verification (§7.1)
# 2. scripts/benchmark_batch_sizes.py - Batch size sweep (10, 25, 50, 100, 200, 500)
# 3. scripts/test_python_file_overhead.py - I/O profiling
# 4. .cargo/check.sh integration for ASAN on Linux (§5.2)

# Memory Safety Validation (C.4 gate)
bd create "Infrastructure: Memory Safety - ASAN/Valgrind CI Integration" \
  -t task -p 1 \
  --json

# This task encompasses:
# 1. Linux ASAN: RUSTFLAGS=-Zsanitizer=address cargo test (§5.2)
# 2. macOS Leaks: leaks -atExit (§5.2)
# 3. Valgrind suppressions for PyO3 (§5.2)
# 4. CI configuration for pre-push checks

1.4 Phases E, F, G (Existing) - Dependency Updates¶

Existing Beads: - mrrc-9wi.4 (Epic) - Phase E: Comprehensive Validation & Testing - mrrc-9wi.5 (Epic) - Phase F: Benchmark Refresh & Performance Analysis - mrrc-9wi.6 (Epic) - Phase G: Documentation Refresh

Current Dependency: E → F → G

Required Updates: Add Phase H as explicit blocker for Phase G

# Update Phase F description to reference Phase C decision gate
# Update Phase G description to reference Phase H results
# Update Phase G to depend on Phase H.Gate

Rationale: - Phase F (post-B benchmarking) happens before Phase C decision - Phase G documentation must cover Phase H threading model and performance tuning - Phase H.Gate must complete before Phase G release documentation finalized

2. Summary of Required Beads Operations¶

2.1 Deletions (Duplicates)¶

ID	Title	Reason
`mrrc-d0m`	Phase C (duplicate)	Remove duplicate epic; keep `mrrc-ppp`

# Note: bd doesn't have delete command; close with reason "Duplicate of mrrc-ppp"

2.2 New Epics to Create¶

Epic Title	Depends On	Notes
Phase H: Pure Rust I/O & Rayon Parallelism	C.Gate (mrrc-ppp.Gate)	10 subtasks total

2.3 New Tasks to Create (under mrrc-ppp - Phase C)¶

Task	Depends On	Priority
C.0: Data Structure, State Machine & Diagnostics	mrrc-18s	1
C.1: Implement read_batch()	C.0	1
C.2: Update next() to Queue FSM	C.1	1
C.3: Iterator Semantics & Idempotence	C.2	1
C.4: Memory Profiling	(parallel)	1
C.Gate: Batch Size Benchmarking (≥1.8x gate)	C.3, C.4	1

2.4 New Tasks to Create (supporting)¶

Task	Category	Priority
Infrastructure: Diagnostic Test Suite	Support (C)	1
Infrastructure: Memory Safety - ASAN/Valgrind CI	Support (C)	2

2.5 Existing Task Updates¶

ID	Update	Reason
`mrrc-9wi.6` (Phase G)	Add depends-on: Phase H.Gate	G docs must cover H threading model

3. Refined Task Breakdown with Plan References¶

Phase C: Batch Reading (Detailed from Plan §6)¶

C.0: Data Structure, State Machine & GIL Verification Test¶

Plan §6 lines 440-448; Plan §3.1-3.3

Deliverables: 1. struct PyMarcReaderState in src-python/src/readers.rs - Fields: buffered_reader, record_queue, queue_capacity_bytes, eof_reached, batch_size - Plan Fig: §3.1 (lines 54-71)

EOF state machine implementation (3-state logic)
Code structure matches Plan Fig: §3.3 (lines 135-172)
Test: All transitions verified in unit tests
Diagnostic tests created:
tests/concurrent_gil_tests.rs with test_gil_release_verification (Plan §7.1 lines 941-944)
scripts/benchmark_batch_sizes.py (Plan §7.1 lines 946-950)
scripts/test_python_file_overhead.py (Plan §7.1 lines 952-955)
Acceptance criteria:
State diagram verified in code review
All 3 diagnostic tests created (not yet run/passed)
Memory bounds calculated: 200 records max, 300KB max (Plan §3.2 lines 89-90)

C.1: Implement read_batch() Method¶

Plan §6 lines 450-454; Plan §3.4

Specification:

fn read_batch(&mut self, py: Python) -> Result<Vec<SmallVec<[u8; 4096]>>> {
    // Plan §3.4 lines 202-227
    // Single GIL acquire/release cycle
    // Returns Vec of up to batch_size=100 records
    // Respects hard limits: 200 records OR 300KB (Plan §3.4 lines 230-236)
}

Acceptance criteria: - Unit test passes: test_read_batch_returns_100_records - GIL contract verified (§3.4 lines 197-199) - Capacity limits enforced - SmallVec<4KB> fallback to heap for larger records

C.2: Update next() to Use Queue¶

Plan §6 lines 456-460; Plan §3.3

Changes: 1. Replace inline read_next_record_bytes() calls with queue-based FSM 2. Implement state transitions per Plan §3.3 diagram (lines 135-172) 3. Call read_batch() once when queue empty and not EOF

Acceptance criteria: - Existing test suite passes unchanged - Queue pop removes records in FIFO order - read_batch() called exactly once per batch (not per record)

C.3: Iterator Semantics & Idempotence¶

Plan §6 lines 462-466; Plan §3.3 lines 175-193

Tests: - test_iterator_idempotence: Repeated __next__ calls after EOF return None (Plan §3.3 lines 177-185) - test_partial_batch_at_eof: 217-record file → batch reads of 100, 100, 17 → all delivered (Plan §3.3 lines 188-193) - test_stopiteration_semantics: Python StopIteration behavior matches expectations

Acceptance criteria: - All 3 tests pass - EOF state idempotent (repeated calls safe, no I/O) - Partial batches at EOF handled correctly

C.4: Memory Profiling Sub-task¶

Plan §6 lines 468-472; Plan §3.2 lines 82-98

Measurements: 1. Memory high watermark during 100k-record read 2. Queue capacity limits enforced (hard stops at 200 records or 300KB) 3. Valgrind/ASAN validation: zero definite leaks

Acceptance criteria: - Memory stable over 100k-record fixture - Capacity limits respected under stress - No memory leaks reported

C.Gate: Batch Size Benchmarking¶

Plan §6 lines 474-478; Plan §3.2 lines 92-131

Benchmark Parameters (Plan §3.2 Decision Tree lines 100-124): - Test batch sizes: 10, 25, 50, 100, 200, 500 - Measure: GIL frequency, memory, per-record latency, speedup curve

Decision Tree (Plan §3.2 lines 100-124): 1. Speedup curve flattens? - YES (peak at 50-200): Accept that batch size - NO (still climbing at 500): Extend to 1000, 2000 - FLAT (all similar): Accept 100; no bottleneck

Best batch size achieves ≥1.8x speedup?
YES: ✓ Gate passes → proceed to Phase H
NO: Diagnose bottleneck (Plan §3.2 lines 119-123)
- GIL releasing? (threading test)
- Memory limits issue? (increase to 600KB)
- Python I/O bottleneck? (Phase H RustFile required)

Outputs: - speedup_curve.csv (batch_size, 2-thread_speedup) - Decision document: PHASE_C_GATE_DECISION.txt

Acceptance criteria: - ≥1.8x speedup on 2-thread concurrent read with optimal batch size - If <1.8x: Root cause documented, Phase H marked as prerequisite

Phase H: Pure Rust I/O & Rayon Parallelism (Detailed from Plan §4 & §6)¶

H.0: Rayon PoC¶

Plan §5.1 lines 349-369; Plan §6 lines 482-486

Scope: Minimal example validating thread-safety approach before full H.4 implementation

Implementation:

// 1. Spawn Rayon task pool
// 2. Populate crossbeam::channel from Rayon
// 3. Consume from channel in main thread
// 4. Verify no panic propagation issues
// 5. Measure overhead vs single-thread

Success Criteria (Plan §5.1 lines 363-366): - No panic leaks; channel closes cleanly - Overhead <5% for small batches - Speedup >1.5x for CPU-bound workload - No memory leaks (valgrind)

H.1: ReaderBackend Enum & Type Detection¶

Plan §6 lines 488-501; Plan §4.1-4.2 lines 248-363

Deliverables:

pub enum ReaderBackend {
    PythonFile(BufferedMarcReader),
    RustFile(std::fs::File),
    CursorBackend(std::io::Cursor<Vec<u8>>),
}

Type Detection Algorithm (Plan §4.1 lines 250-363): - Input: PyAny (Python object) - Routing: 8 supported types → backends - str, pathlib.Path → RustFile - bytes, bytearray, BytesIO → CursorBackend - file object, socket.socket → PythonFile - Fallback: Unknown types → TypeError (fail-fast, no silent fallback)

Test Coverage (Plan §6 lines 493-501): - 8 type detection tests (one per supported type) - Unknown type error test - Type routing correctness

Acceptance Criteria: - All 8 supported types route correctly - Unknown types raise TypeError with descriptive message - Type detection unit tests pass

H.2: RustFile Backend (Sequential)¶

Plan §6 lines 502-506; Plan §4.2 lines 365-430

Implementation:

impl ReaderBackend {
    pub fn read_next_record_bytes(&mut self) -> Result<Option<Vec<u8>>> {
        // Use std::fs::File for direct Rust I/O
        // No GIL required (pure Rust)
        // Respects MARC record boundaries (0x1E delimiters)
    }
}

Acceptance Criteria: - File reads execute without GIL - Record boundaries detected correctly (0x1E terminators) - Parity test: output identical to PythonFile backend (byte-for-byte)

H.2b: CursorBackend (In-Memory)¶

Plan §6 lines 507-511; Plan §4.3 lines 432-463

Implementation:

impl ReaderBackend {
    // For bytes/bytearray/BytesIO: std::io::Cursor<Vec<u8>>
    // For in-memory batch processing
    // No file I/O, pure CPU parsing
}

Acceptance Criteria: - Bytes input processed correctly - BytesIO seekable state preserved - Parity test: output identical to RustFile - No Python references retained

H.3: Sequential Baseline & Parity Tests¶

Plan §6 lines 513-525; Plan §7.2 lines 1001-1018

Baseline Strategy (Plan §7.2 lines 1001-1018): - Run after Phase C complete (H.3 depends on C.Gate) - Establish sequential performance without parallelism - Measure: RustFile vs PythonFile vs CursorBackend - Compare against Phase B baseline established in Phase F

Tests (Plan §6 lines 513-525): - test_rust_file_parity: RustFile output = PythonFile (byte-for-byte) - test_cursor_backend_parity: CursorBackend output = RustFile - test_type_detection_correctness: All 8 input types route correctly - Coverage target: ≥95% line coverage (H.1, H.2, H.2b)

Benchmarks: - Command: cargo bench --bench baseline --release -- --save-baseline=phase_h3 - Measure: Single-thread RustFile speedup vs Phase B - Acceptance: Establishes new sequential baseline for H.4 comparison

H.4a: Boundary Scanner (Multi-threaded)¶

Plan §6 lines 527-533; Plan §4.4a lines 465-519

Function Signature:

pub fn scan_record_boundaries(
    buffer: &[u8],
    limit: usize,  // max records to find
) -> Result<Vec<(usize, usize)>> {
    // Returns: [(offset, length), ...]
    // Finds all 0x1E delimiters; returns boundary pairs
}

Implementation Details (Plan §4.4a lines 465-519): - Scan for 0x1E (0x1D 0x1E) record terminators - Extract length from MARC leader (bytes 0-4) - Verify boundaries don't exceed buffer - Thread-safe: Cursor over immutable buffer

Test Cases (Plan §5.3 lines 431): - test_boundary_scanner_correctness: All record boundaries identified - Truncated files, malformed records

Acceptance Criteria: - Boundary pairs accurate - No missed or duplicate boundaries - Handles edge cases (truncated records, eof)

H.4b: Rayon Parser Pool (Parallel)¶

Plan §6 lines 535-541; Plan §4.4b lines 521-541

Implementation:

fn parse_batch_parallel(
    record_boundaries: Vec<(usize, usize)>,
    buffer: &[u8],
) -> Result<Vec<Record>> {
    record_boundaries
        .par_iter()  // Rayon parallel iterator
        .map(|&(offset, len)| {
            let record_bytes = &buffer[offset..offset + len];
            let cursor = std::io::Cursor::new(record_bytes);
            MarcReader::new(cursor).read_record()
        })
        .collect::<Result<Vec<_>>>()
}

Thread Configuration (Plan §4.5 lines 322-343): - Hidden from Python API - Respects RAYON_NUM_THREADS env var - Default: rayon::current_num_threads() (all cores) - Test suite: Run with RAYON_NUM_THREADS=2

Acceptance Criteria: - Parser pool scales linearly with thread count (up to CPU limit) - No panics; errors propagate correctly - No Python references in Rayon tasks

H.4c: Producer-Consumer Pipeline (Backpressure)¶

Plan §6 lines 543-549; Plan §4.4c lines 543-613

Implementation:

// Producer task (background)
fn producer_task(file: File, sender: Sender<Record>) {
    let mut buffer = vec![0u8; 512 * 1024];  // 512 KB
    loop {
        let n = file.read(&mut buffer).expect("read failed");
        if n == 0 { break; }
        let boundaries = scan_record_boundaries(&file, &buffer[..n], 100);
        let records = parse_batch_parallel(boundaries, &buffer[..n]);
        for record in records {
            sender.send(record).expect("send failed");  // Blocks if channel full
        }
    }
}

// Consumer: __next__()
fn __next__(&mut self) -> PyResult<Option<PyRecord>> {
    match self.receiver.try_recv() {
        Ok(record) => Ok(Some(PyRecord { inner: record })),
        Err(TryRecvError::Empty) => {
            match self.receiver.recv() {
                Ok(record) => Ok(Some(PyRecord { inner: record })),
                Err(_) => Ok(None),  // Channel closed
            }
        },
        Err(TryRecvError::Disconnected) => Ok(None),
    }
}

Backpressure Behavior (Plan §4.4c lines 615-620): - Channel capacity: 1000 records - Producer blocks when full (prevents OOM) - Consumer unblocks producer when queue drained

Test Cases (Plan §5.3 lines 426-432): - test_rayon_channel_backpressure: Producer blocks correctly - test_rayon_panic_propagation: Errors bubble up correctly

Acceptance Criteria: - Backpressure works as designed - No deadlocks - OOM prevented

H.5: Integration Tests & Error Handling¶

Plan §6 lines 551-565; Plan §5.3 lines 426-432

Test Categories: 1. Panic safety: test_rayon_pool_safety (no panics, clean shutdown) 2. Error propagation: test_rayon_panic_propagation 3. Concurrency: test_concurrent_readers (multiple readers in pool) 4. Coverage: ≥95% line coverage (H.4a, H.4b, H.4c)

Acceptance Criteria: - All integration tests pass - Error handling proven correct - Coverage target met

H.Gate: Parallel Benchmarking Gate¶

Plan §6 lines 567-575; Plan §7.2 lines 1034-1037

Gate Criteria: - 4-thread concurrent read speedup ≥2.5x vs Phase B baseline - Memory overhead <10% vs Phase H.3 sequential - Rayon overhead <5%

Diagnostic Tools (Plan §7.2 lines 962-997): If speedup <2.5x:

# 1. CPU utilization check
cargo test test_rayon_4threads --release -- --nocapture
# Expected: 4 cores at 80%+ utilization

# 2. Channel contention analysis
cargo build --release --features="profiling"
./target/release/test_rayon_profile 4 threads

# 3. Parser performance isolation
cargo bench --bench rayon_parse_only

# 4. Memory pressure check
valgrind --tool=massif target/release/test_rayon_memory_profile

Benchmarking Command:

cargo bench --bench rayon_parallel --release -- --baseline=phase_b
# Expected: ≥2.5x speedup vs phase_b

Acceptance Criteria: - ≥2.5x speedup achieved - Diagnostic data collected and analyzed - Performance characteristics documented

4. Existing Epics & Their Phase H Dependencies¶

Phase E: Comprehensive Validation & Testing (`mrrc-9wi.4`)¶

Status: Open
Current Dependency: Phase D (Phase D complete)
Plan Reference: Plan §5.3 lines 408-432

Subtasks (sample from existing beads): - Thread safety tests - Concurrency validation - Regression tests (pymarc compatibility)

UPDATE REQUIRED: Add note that Phase E runs independently; Phase H has its own test coverage (H.5). No explicit blocker, but Phase H.Gate happens after Phase E completion.

Phase F: Benchmark Refresh & Performance Analysis (`mrrc-9wi.5`)¶

Status: Open
Current Dependency: Phase E
Plan Reference: Plan §7.2 lines 1001-1018

Key Decision Gate: Phase C Deferral (mrrc-pfw) - If F shows ≥2.0x speedup on 2-thread: Phase C is optional - If F shows <2.0x speedup: Phase C is required

UPDATE REQUIRED: Reference updated to Plan Revisions: Phase C speedup target is ≥1.8x (not 2.0x). Phase F happens before Phase C implementation decision.

Phase G: Documentation Refresh (`mrrc-9wi.6`)¶

Status: Open
Current Dependency: Phase F
Plan Reference: Plan §6.3 lines 573-596; Plan §7 (Documentation section)

Documentation Sections (from existing description): - README.md - Performance section - PERFORMANCE.md - Benchmark results - ARCHITECTURE.md - Three-phase pattern - CHANGELOG.md - examples/ - Concurrent reading/writing

UPDATE REQUIRED: - Add Phase H.Gate as blocker (G cannot release until H.Gate passes) - Add Phase H threading model to ARCHITECTURE.md - Add RAYON_NUM_THREADS tuning to API docs - Add backpressure explanation to threading guide

5. Priority & Sequencing Recommendations¶

Critical Path (Must-Have Order)¶

1. C.0 (Data structure + diagnostics) [Day 1-2]
   ↓
2. C.1 (read_batch) [Day 2-3]
   ↓
3. C.2-C.3 (Queue FSM + Idempotence) [Day 3-4]
   ↓
4. C.Gate (Batch benchmarking, ≥1.8x gate) [Day 5] ← BLOCKER for H.3
   ↓
5. H.0 (Rayon PoC) [Can start Day 1 independently]
   ↓
6. H.1-H.2b (Type detection + backends) [Day 2-4, parallel with C.2-C.3]
   ↓
7. H.3 (Sequential baseline) [Day 5+1, AFTER C.Gate passes]
   ↓
8. H.4a-H.4c (Pipeline implementation) [Day 2-4 after H.3]
   ↓
9. H.Gate (Parallel benchmarking, ≥2.5x gate) [Day 5+2]

Parallelizable Tasks¶

C.0-C.3 must be sequential (phase C foundation)
C.4 (memory profiling) can run parallel with C.2-C.3
H.0 (Rayon PoC) can start immediately, independent of Phase C
H.1-H.2b (type detection + backends) can progress while C.0-C.3 in flight
H.4a-H.4c implementation must follow H.3 baseline

Timeline Impact¶

Phase C: 5 days (critical path dependency)
Phase H.0 (PoC): 1 day (early risk mitigation)
Phase H.1-H.2b: 3 days (backend implementation)
Phase H.3: 1 day (BLOCKED until C.Gate passes, +1 day from Phase C completion)
Phase H.4a-H.4c: 3 days
Phase H.Gate: 1 day
Total (serial): 14 days
Total (parallel C + H.0-H.2b): 8 days + 1 day buffer for C.Gate slowness

6. Metrics & Gate Criteria Summary¶

Gate	Condition	Pass Value	Current Status
C.Gate	2-thread speedup (batch read)	≥1.8x	Not yet run (C not started)
H.3.Gate	Sequential baseline parity	Identical output vs PythonFile	Not applicable (H not started)
H.Gate	4-thread speedup (Rayon)	≥2.5x	Not applicable (H not started)

7. Next Steps: Immediate Actions¶

7.1 Beads Cleanup¶

# 1. Close duplicate Phase C epic
bd list | grep "mrrc-d0m"  # Verify it's the duplicate
# (Note: Create corresponding note/close reason in manual update)

# 2. Verify Phase C epic mrrc-ppp has no existing children
bd list | grep -A5 "mrrc-ppp"

7.2 Create Phase C Subtasks (Priority 1)¶

Use script from §1.1 above; creates C.0 through C.Gate with dependencies

7.3 Create Phase H Epic + Subtasks (Priority 1)¶

Use script from §1.2 above; creates H epic with 10 subtasks and dependency chain

7.4 Create Diagnostic Infrastructure Tasks (Priority 1)¶

Use script from §1.3 above; creates test suite and memory safety CI tasks

7.5 Update Existing Epics (Priority 2)¶

Phase E: No changes required
Phase F: Reference Phase C decision gate (≥1.8x, not 2.0x)
Phase G: Add Phase H.Gate as blocker; update documentation sections

7.6 Create Transition Epics (Optional, Priority 3)¶

"Phase C → H Transition" task to verify C.Gate completion before H.3 starts
"Phase H → Release Prep" task to verify H.Gate before final release

8. Beads Command Reference¶

Create Phase C Epic (if not exists) or Verify¶

# List Phase C epic
bd list --json | jq '.[] | select(.id == "mrrc-ppp")'

Create All Phase H Tasks (batch command)¶

# Save this as create_phase_h.sh
#!/bin/bash
H_EPIC=$(bd create "Phase H: Pure Rust I/O & Rayon Parallelism" \
  -t epic -p 1 --deps discovered-from:mrrc-ppp.Gate --json | jq -r '.id')

# Then create subtasks with returned $H_EPIC ID...

Validate Dependency Chain¶

bd dep show mrrc-ppp    # Phase C dependencies
bd dep show $H_EPIC     # Phase H dependencies

9. Key Differences from Original Plan¶

Aspect	Original Plan	Revisions (This Doc)
Phase C Epics	Abstract epic (mrrc-ppp, mrrc-d0m)	Detailed task breakdown (C.0-C.Gate) with explicit plan references
Phase H	No beads existed	Full epic + 10 subtasks with dependency chain
Diagnostic Tools	Mentioned in §7	Created as explicit infrastructure tasks
Dependency Clarity	Implicit (C before H.3)	Explicit: C.Gate → H.3, H.Gate → Phase G release docs
Benchmark Baselines	Phase B, Phase C vs B, Phase H vs B	Clarified in Plan §7.2; baseline cascade strategy documented
Batch Size Gate	≥1.8x speedup criterion	Decision tree (3-tier: pass/investigate/fail) with diagnostic workflow

10. Risk Mitigation (Beads-Integrated)¶

During Implementation: - Create "Risk Register" task under Phase C/H to track known risks - Use bd dep to verify blockers are cleared before proceeding - Gate tasks force decision points (C.Gate, H.Gate must be explicitly completed)

Rollback Contingency: - If C.Gate fails (<1.8x speedup): Document root cause; decide whether Phase H RustFile is prerequisite - If H.Gate fails (<2.5x speedup): Document bottleneck (CPU vs I/O); decide release timeline

11. Success Criteria (Overall)¶

✓ All Phase C subtasks completed (C.0-C.Gate)
✓ C.Gate speedup ≥1.8x on 2-thread concurrent read
✓ All Phase H subtasks completed (H.0-H.Gate)
✓ H.Gate speedup ≥2.5x on 4-thread concurrent read
✓ Phase G documentation updated with Phase H threading model
✓ All edge case tests passing (memory safety, panic handling, type detection)
✓ Zero definite memory leaks (ASAN/Valgrind)
✓ Backward compatible API (no breaking changes)

Document Status: Ready for Beads Implementation
Prepared by: Amp AI
Date: January 2, 2026
Final Review: Cross-referenced with GIL_RELEASE_HYBRID_IMPLEMENTATION_PLAN_REVISIONS.md (all §, lines, code samples verified)

Hybrid Implementation Plan - Beads Mapping & Revisions¶

Executive Summary¶

1. Detailed Beads Mapping¶

1.1 Phase C: Batch Reading (Compatibility Path)¶

1.2 Phase H: Pure Rust I/O & Rayon Parallelism (Performance Path)¶

1.3 Diagnostic & Infrastructure Tasks¶

1.4 Phases E, F, G (Existing) - Dependency Updates¶

2. Summary of Required Beads Operations¶

2.1 Deletions (Duplicates)¶

2.2 New Epics to Create¶

2.3 New Tasks to Create (under mrrc-ppp - Phase C)¶

2.4 New Tasks to Create (supporting)¶

2.5 Existing Task Updates¶

3. Refined Task Breakdown with Plan References¶

Phase C: Batch Reading (Detailed from Plan §6)¶

C.0: Data Structure, State Machine & GIL Verification Test¶

C.1: Implement read_batch() Method¶

C.2: Update next() to Use Queue¶

C.3: Iterator Semantics & Idempotence¶

C.4: Memory Profiling Sub-task¶

C.Gate: Batch Size Benchmarking¶

Phase H: Pure Rust I/O & Rayon Parallelism (Detailed from Plan §4 & §6)¶

H.0: Rayon PoC¶

H.1: ReaderBackend Enum & Type Detection¶

H.2: RustFile Backend (Sequential)¶

H.2b: CursorBackend (In-Memory)¶

H.3: Sequential Baseline & Parity Tests¶

H.4a: Boundary Scanner (Multi-threaded)¶

H.4b: Rayon Parser Pool (Parallel)¶

H.4c: Producer-Consumer Pipeline (Backpressure)¶

H.5: Integration Tests & Error Handling¶

H.Gate: Parallel Benchmarking Gate¶

4. Existing Epics & Their Phase H Dependencies¶

Phase E: Comprehensive Validation & Testing (mrrc-9wi.4)¶

Phase F: Benchmark Refresh & Performance Analysis (mrrc-9wi.5)¶

Phase G: Documentation Refresh (mrrc-9wi.6)¶

5. Priority & Sequencing Recommendations¶

Critical Path (Must-Have Order)¶

Parallelizable Tasks¶

Timeline Impact¶

6. Metrics & Gate Criteria Summary¶

7. Next Steps: Immediate Actions¶

7.1 Beads Cleanup¶

7.2 Create Phase C Subtasks (Priority 1)¶

7.3 Create Phase H Epic + Subtasks (Priority 1)¶

7.4 Create Diagnostic Infrastructure Tasks (Priority 1)¶

7.5 Update Existing Epics (Priority 2)¶

7.6 Create Transition Epics (Optional, Priority 3)¶

8. Beads Command Reference¶

Create Phase C Epic (if not exists) or Verify¶

Create All Phase H Tasks (batch command)¶

Validate Dependency Chain¶

9. Key Differences from Original Plan¶

10. Risk Mitigation (Beads-Integrated)¶

11. Success Criteria (Overall)¶

Phase E: Comprehensive Validation & Testing (`mrrc-9wi.4`)¶

Phase F: Benchmark Refresh & Performance Analysis (`mrrc-9wi.5`)¶

Phase G: Documentation Refresh (`mrrc-9wi.6`)¶