Hybrid Implementation Plan - Beads Mapping & Revisions¶
Date: January 2, 2026
Status: Ready for Beads Integration
Purpose: Cross-reference detailed implementation plan against existing bd (beads) epics/tasks and identify gaps for creation
Executive Summary¶
The GIL_RELEASE_HYBRID_IMPLEMENTATION_PLAN_REVISIONS.md document specifies two major phases:
- Phase C: Batch Reading (Compatibility Path) — Amortize GIL acquire/release overhead
- Phase H: Pure Rust I/O & Rayon Parallelism (Performance Path) — Full parallelism with RustFile backend
Current Beads Status:
- Phase C epics exist (mrrc-ppp, mrrc-d0m) but lack detailed subtasks
- Phase H epic does not exist — must be created
- Related infrastructure (diagnostic tests, benchmarks) is scattered or missing
- Phase E (Validation), Phase F (Benchmarks), Phase G (Docs) exist but lack Phase H context
Action Required: 1. Delete duplicate Phase C epic (mrrc-d0m) 2. Create Phase H epic with full task breakdown 3. Populate Phase C with detailed subtasks (C.0 through C.Gate) 4. Create diagnostic/infrastructure tasks (GIL verification tests, batch size benchmarking utilities) 5. Update Phase E/F/G to explicitly depend on Phase C completion
1. Detailed Beads Mapping¶
1.1 Phase C: Batch Reading (Compatibility Path)¶
Plan Reference: §3 (lines 48-243)
Existing Beads:
- mrrc-ppp (Epic) - Title correct; no subtasks
- mrrc-d0m (Epic) - Duplicate; recommend deletion
- mrrc-br3 (Task) - "GIL Release: Retest Phase B" - Related but not Phase C subtask
- mrrc-18s (Task) - "GIL Release: Implement BufferedMarcReader with SmallVec" - Phase A task, prerequisite to C
- mrrc-kzw (Task) - "GIL Release: Add thread safety tests" - Should be Phase C.Gate
- mrrc-5ph (Task) - "GIL Release: Optimize batch reading" - Partial Phase C task
Gaps Identified: | Task ID | Plan §6 | Title | Status | Gap | |---------|---------|-------|--------|-----| | MISSING | C.0 | Data Structure, State Machine, Diagnostic Tests | - | Critical: Foundation for all C subtasks | | MISSING | C.1 | Implement read_batch() Method | - | Critical: Core reading batching | | MISSING | C.2 | Update next() to Use Queue | - | Critical: FSM integration | | MISSING | C.3 | Iterator Semantics & Idempotence | - | High: EOF behavior | | MISSING | C.4 | Memory Profiling | - | Medium: Memory safety gate | | MISSING | C.Gate | Benchmark Batch Sizes (10-500 sweep) | - | Critical: Performance gate (≥1.8x speedup target) |
Required Actions:
Create Phase C subtasks under mrrc-ppp:
# C.0: Foundation (prerequisite to C.1)
bd create "C.0: Data Structure, State Machine & GIL Verification Test" \
--parent mrrc-ppp -t task -p 1 \
--deps discovered-from:mrrc-18s \
--json
# Returned ID: <C0-id>
# C.0 Acceptance:
# - PyMarcReaderState struct defined with VecDeque<SmallVec<[u8; 4096]>>
# - EOF state machine (3-state logic) implemented and code-reviewed
# - test_gil_release_verification.rs created (threaded reader, measures wall-clock vs sequential)
# - Scripts: benchmark_batch_sizes.py, test_python_file_overhead.py created in scripts/
# C.1: read_batch() Implementation (depends on C.0)
bd create "C.1: Implement read_batch() Method with Single GIL Cycle" \
--parent mrrc-ppp -t task -p 1 \
--deps discovered-from:<C0-id> \
--json
# C.2: Queue FSM Integration (depends on C.1)
bd create "C.2: Update __next__() to Use Queue & EOF State Machine" \
--parent mrrc-ppp -t task -p 1 \
--deps discovered-from:<C1-id> \
--json
# C.3: Iterator Semantics (depends on C.2)
bd create "C.3: Iterator Semantics & Idempotence Verification" \
--parent mrrc-ppp -t task -p 1 \
--deps discovered-from:<C2-id> \
--json
# C.4: Memory Profiling (parallel with C.2-C.3)
bd create "C.4: Memory Profiling & Bounds Validation" \
--parent mrrc-ppp -t task -p 2 \
--json
# C.Gate: Benchmarking Gate (depends on C.3, C.4)
bd create "C.Gate: Benchmark Batch Sizes (10-500 sweep) - ≥1.8x Speedup Gate" \
--parent mrrc-ppp -t task -p 1 \
--deps discovered-from:<C3-id>,discovered-from:<C4-id> \
--json
Phase C Dependency Graph:
mrrc-18s (SmallVec impl)
↓
C.0 (State machine + diagnostics)
↓
C.1 (read_batch)
↓
C.2 (Queue FSM)
↓
C.3 (Idempotence) C.4 (Memory profiling)
↓ ↓
└─────→ C.Gate ←─┘
(≥1.8x speedup)
1.2 Phase H: Pure Rust I/O & Rayon Parallelism (Performance Path)¶
Plan Reference: §4 & §6 (lines 246-645)
Existing Beads: NONE
Status: Phase H epic and all subtasks must be created
Critical Path Blockers: - C.Gate must pass (≥1.8x speedup) before proceeding to H.3-H.4 - H.0 (Rayon PoC) can run in parallel with Phase C (low dependency) - H.3 sequential baseline requires C.Gate completion for accurate measurement context
Recommended Creation Order:
# Phase H Epic (depends on C.Gate)
bd create "Phase H: Pure Rust I/O & Rayon Parallelism" \
-t epic -p 1 \
--deps discovered-from:<C-Gate-id> \
--json
# Returned ID: <H-epic-id>
# H.0: Rayon PoC (can start immediately; independent)
bd create "H.0: Rayon PoC - Thread Pool & Channel Pipeline Validation" \
--parent <H-epic-id> -t task -p 1 \
--json
# Success: >1.5x speedup, no panics, <5% overhead
# H.1: Type Detection (depends on H.0 completing or parallel)
bd create "H.1: ReaderBackend Enum & Type Detection Algorithm" \
--parent <H-epic-id> -t task -p 1 \
--deps discovered-from:<H0-id> \
--json
# H.2: RustFile Backend (depends on H.1)
bd create "H.2: RustFile Backend Implementation - Sequential Read" \
--parent <H-epic-id> -t task -p 1 \
--deps discovered-from:<H1-id> \
--json
# H.2b: CursorBackend (depends on H.1)
bd create "H.2b: CursorBackend Implementation - Memory-Mapped & Bytes" \
--parent <H-epic-id> -t task -p 1 \
--deps discovered-from:<H1-id> \
--json
# H.3: Sequential Baseline (depends on C.Gate, H.2, H.2b)
bd create "H.3: Sequential Baseline & Parity Tests (RustFile + CursorBackend)" \
--parent <H-epic-id> -t task -p 1 \
--deps discovered-from:<C-Gate-id>,discovered-from:<H2-id>,discovered-from:<H2b-id> \
--json
# H.4a: Boundary Scanner (parallel with H.3)
bd create "H.4a: Record Boundary Scanner (0x1E delimiter + multi-threaded)" \
--parent <H-epic-id> -t task -p 1 \
--json
# H.4b: Rayon Parser Pool (depends on H.4a)
bd create "H.4b: Rayon Parser Pool - Parallel Batch Processing" \
--parent <H-epic-id> -t task -p 1 \
--deps discovered-from:<H4a-id> \
--json
# H.4c: Producer-Consumer Pipeline (depends on H.4b)
bd create "H.4c: Producer-Consumer Pipeline - Backpressure & Channels" \
--parent <H-epic-id> -t task -p 1 \
--deps discovered-from:<H4b-id> \
--json
# H.5: Integration & Error Handling (depends on H.4c)
bd create "H.5: Integration Tests & Error Propagation Validation" \
--parent <H-epic-id> -t task -p 1 \
--deps discovered-from:<H4c-id> \
--json
# H.Gate: Parallel Benchmarking Gate (depends on H.5)
bd create "H.Gate: Parallel Benchmarking - ≥2.5x Speedup (4-thread) Gate" \
--parent <H-epic-id> -t task -p 1 \
--deps discovered-from:<H5-id> \
--json
Phase H Dependency Graph:
C.Gate (batch reading ≥1.8x)
↓
H.0 (Rayon PoC)
↓
H.1 (Type detection)
↓
H.2 (RustFile) H.2b (CursorBackend)
↓ ↓
└─────→ H.3 ←─┘ (Sequential baseline)
↓
H.4a (Boundary scanner)
↓
H.4b (Rayon parser pool)
↓
H.4c (Producer-consumer pipeline)
↓
H.5 (Integration)
↓
H.Gate (≥2.5x speedup)
1.3 Diagnostic & Infrastructure Tasks¶
Plan Reference: §7 (lines 850-1045)
Existing Beads: - None dedicated to diagnostic infrastructure
Required Creation (support tasks for C.0 & C.Gate):
# Diagnostic Test Suite Setup (C.0 dependency)
bd create "Infrastructure: Diagnostic Test Suite (GIL verification, benchmarking utilities)" \
-t task -p 1 \
--json
# This task encompasses:
# 1. tests/concurrent_gil_tests.rs - test_gil_release_verification (§7.1)
# 2. scripts/benchmark_batch_sizes.py - Batch size sweep (10, 25, 50, 100, 200, 500)
# 3. scripts/test_python_file_overhead.py - I/O profiling
# 4. .cargo/check.sh integration for ASAN on Linux (§5.2)
# Memory Safety Validation (C.4 gate)
bd create "Infrastructure: Memory Safety - ASAN/Valgrind CI Integration" \
-t task -p 1 \
--json
# This task encompasses:
# 1. Linux ASAN: RUSTFLAGS=-Zsanitizer=address cargo test (§5.2)
# 2. macOS Leaks: leaks -atExit (§5.2)
# 3. Valgrind suppressions for PyO3 (§5.2)
# 4. CI configuration for pre-push checks
1.4 Phases E, F, G (Existing) - Dependency Updates¶
Existing Beads:
- mrrc-9wi.4 (Epic) - Phase E: Comprehensive Validation & Testing
- mrrc-9wi.5 (Epic) - Phase F: Benchmark Refresh & Performance Analysis
- mrrc-9wi.6 (Epic) - Phase G: Documentation Refresh
Current Dependency: E → F → G
Required Updates: Add Phase H as explicit blocker for Phase G
# Update Phase F description to reference Phase C decision gate
# Update Phase G description to reference Phase H results
# Update Phase G to depend on Phase H.Gate
Rationale: - Phase F (post-B benchmarking) happens before Phase C decision - Phase G documentation must cover Phase H threading model and performance tuning - Phase H.Gate must complete before Phase G release documentation finalized
2. Summary of Required Beads Operations¶
2.1 Deletions (Duplicates)¶
| ID | Title | Reason |
|---|---|---|
mrrc-d0m |
Phase C (duplicate) | Remove duplicate epic; keep mrrc-ppp |
2.2 New Epics to Create¶
| Epic Title | Depends On | Notes |
|---|---|---|
| Phase H: Pure Rust I/O & Rayon Parallelism | C.Gate (mrrc-ppp.Gate) | 10 subtasks total |
2.3 New Tasks to Create (under mrrc-ppp - Phase C)¶
| Task | Depends On | Priority |
|---|---|---|
| C.0: Data Structure, State Machine & Diagnostics | mrrc-18s | 1 |
| C.1: Implement read_batch() | C.0 | 1 |
| C.2: Update next() to Queue FSM | C.1 | 1 |
| C.3: Iterator Semantics & Idempotence | C.2 | 1 |
| C.4: Memory Profiling | (parallel) | 1 |
| C.Gate: Batch Size Benchmarking (≥1.8x gate) | C.3, C.4 | 1 |
2.4 New Tasks to Create (supporting)¶
| Task | Category | Priority |
|---|---|---|
| Infrastructure: Diagnostic Test Suite | Support (C) | 1 |
| Infrastructure: Memory Safety - ASAN/Valgrind CI | Support (C) | 2 |
2.5 Existing Task Updates¶
| ID | Update | Reason |
|---|---|---|
mrrc-9wi.6 (Phase G) |
Add depends-on: Phase H.Gate | G docs must cover H threading model |
3. Refined Task Breakdown with Plan References¶
Phase C: Batch Reading (Detailed from Plan §6)¶
C.0: Data Structure, State Machine & GIL Verification Test¶
Plan §6 lines 440-448; Plan §3.1-3.3
Deliverables:
1. struct PyMarcReaderState in src-python/src/readers.rs
- Fields: buffered_reader, record_queue, queue_capacity_bytes, eof_reached, batch_size
- Plan Fig: §3.1 (lines 54-71)
- EOF state machine implementation (3-state logic)
- Code structure matches Plan Fig: §3.3 (lines 135-172)
-
Test: All transitions verified in unit tests
-
Diagnostic tests created:
tests/concurrent_gil_tests.rswithtest_gil_release_verification(Plan §7.1 lines 941-944)scripts/benchmark_batch_sizes.py(Plan §7.1 lines 946-950)-
scripts/test_python_file_overhead.py(Plan §7.1 lines 952-955) -
Acceptance criteria:
- State diagram verified in code review
- All 3 diagnostic tests created (not yet run/passed)
- Memory bounds calculated: 200 records max, 300KB max (Plan §3.2 lines 89-90)
C.1: Implement read_batch() Method¶
Plan §6 lines 450-454; Plan §3.4
Specification:
fn read_batch(&mut self, py: Python) -> Result<Vec<SmallVec<[u8; 4096]>>> {
// Plan §3.4 lines 202-227
// Single GIL acquire/release cycle
// Returns Vec of up to batch_size=100 records
// Respects hard limits: 200 records OR 300KB (Plan §3.4 lines 230-236)
}
Acceptance criteria:
- Unit test passes: test_read_batch_returns_100_records
- GIL contract verified (§3.4 lines 197-199)
- Capacity limits enforced
- SmallVec<4KB> fallback to heap for larger records
C.2: Update next() to Use Queue¶
Plan §6 lines 456-460; Plan §3.3
Changes:
1. Replace inline read_next_record_bytes() calls with queue-based FSM
2. Implement state transitions per Plan §3.3 diagram (lines 135-172)
3. Call read_batch() once when queue empty and not EOF
Acceptance criteria: - Existing test suite passes unchanged - Queue pop removes records in FIFO order - read_batch() called exactly once per batch (not per record)
C.3: Iterator Semantics & Idempotence¶
Plan §6 lines 462-466; Plan §3.3 lines 175-193
Tests:
- test_iterator_idempotence: Repeated __next__ calls after EOF return None (Plan §3.3 lines 177-185)
- test_partial_batch_at_eof: 217-record file → batch reads of 100, 100, 17 → all delivered (Plan §3.3 lines 188-193)
- test_stopiteration_semantics: Python StopIteration behavior matches expectations
Acceptance criteria: - All 3 tests pass - EOF state idempotent (repeated calls safe, no I/O) - Partial batches at EOF handled correctly
C.4: Memory Profiling Sub-task¶
Plan §6 lines 468-472; Plan §3.2 lines 82-98
Measurements: 1. Memory high watermark during 100k-record read 2. Queue capacity limits enforced (hard stops at 200 records or 300KB) 3. Valgrind/ASAN validation: zero definite leaks
Acceptance criteria: - Memory stable over 100k-record fixture - Capacity limits respected under stress - No memory leaks reported
C.Gate: Batch Size Benchmarking¶
Plan §6 lines 474-478; Plan §3.2 lines 92-131
Benchmark Parameters (Plan §3.2 Decision Tree lines 100-124): - Test batch sizes: 10, 25, 50, 100, 200, 500 - Measure: GIL frequency, memory, per-record latency, speedup curve
Decision Tree (Plan §3.2 lines 100-124): 1. Speedup curve flattens? - YES (peak at 50-200): Accept that batch size - NO (still climbing at 500): Extend to 1000, 2000 - FLAT (all similar): Accept 100; no bottleneck
- Best batch size achieves ≥1.8x speedup?
- YES: ✓ Gate passes → proceed to Phase H
- NO: Diagnose bottleneck (Plan §3.2 lines 119-123)
- GIL releasing? (threading test)
- Memory limits issue? (increase to 600KB)
- Python I/O bottleneck? (Phase H RustFile required)
Outputs:
- speedup_curve.csv (batch_size, 2-thread_speedup)
- Decision document: PHASE_C_GATE_DECISION.txt
Acceptance criteria: - ≥1.8x speedup on 2-thread concurrent read with optimal batch size - If <1.8x: Root cause documented, Phase H marked as prerequisite
Phase H: Pure Rust I/O & Rayon Parallelism (Detailed from Plan §4 & §6)¶
H.0: Rayon PoC¶
Plan §5.1 lines 349-369; Plan §6 lines 482-486
Scope: Minimal example validating thread-safety approach before full H.4 implementation
Implementation:
// 1. Spawn Rayon task pool
// 2. Populate crossbeam::channel from Rayon
// 3. Consume from channel in main thread
// 4. Verify no panic propagation issues
// 5. Measure overhead vs single-thread
Success Criteria (Plan §5.1 lines 363-366): - No panic leaks; channel closes cleanly - Overhead <5% for small batches - Speedup >1.5x for CPU-bound workload - No memory leaks (valgrind)
H.1: ReaderBackend Enum & Type Detection¶
Plan §6 lines 488-501; Plan §4.1-4.2 lines 248-363
Deliverables:
pub enum ReaderBackend {
PythonFile(BufferedMarcReader),
RustFile(std::fs::File),
CursorBackend(std::io::Cursor<Vec<u8>>),
}
Type Detection Algorithm (Plan §4.1 lines 250-363):
- Input: PyAny (Python object)
- Routing: 8 supported types → backends
- str, pathlib.Path → RustFile
- bytes, bytearray, BytesIO → CursorBackend
- file object, socket.socket → PythonFile
- Fallback: Unknown types → TypeError (fail-fast, no silent fallback)
Test Coverage (Plan §6 lines 493-501): - 8 type detection tests (one per supported type) - Unknown type error test - Type routing correctness
Acceptance Criteria: - All 8 supported types route correctly - Unknown types raise TypeError with descriptive message - Type detection unit tests pass
H.2: RustFile Backend (Sequential)¶
Plan §6 lines 502-506; Plan §4.2 lines 365-430
Implementation:
impl ReaderBackend {
pub fn read_next_record_bytes(&mut self) -> Result<Option<Vec<u8>>> {
// Use std::fs::File for direct Rust I/O
// No GIL required (pure Rust)
// Respects MARC record boundaries (0x1E delimiters)
}
}
Acceptance Criteria: - File reads execute without GIL - Record boundaries detected correctly (0x1E terminators) - Parity test: output identical to PythonFile backend (byte-for-byte)
H.2b: CursorBackend (In-Memory)¶
Plan §6 lines 507-511; Plan §4.3 lines 432-463
Implementation:
impl ReaderBackend {
// For bytes/bytearray/BytesIO: std::io::Cursor<Vec<u8>>
// For in-memory batch processing
// No file I/O, pure CPU parsing
}
Acceptance Criteria: - Bytes input processed correctly - BytesIO seekable state preserved - Parity test: output identical to RustFile - No Python references retained
H.3: Sequential Baseline & Parity Tests¶
Plan §6 lines 513-525; Plan §7.2 lines 1001-1018
Baseline Strategy (Plan §7.2 lines 1001-1018): - Run after Phase C complete (H.3 depends on C.Gate) - Establish sequential performance without parallelism - Measure: RustFile vs PythonFile vs CursorBackend - Compare against Phase B baseline established in Phase F
Tests (Plan §6 lines 513-525):
- test_rust_file_parity: RustFile output = PythonFile (byte-for-byte)
- test_cursor_backend_parity: CursorBackend output = RustFile
- test_type_detection_correctness: All 8 input types route correctly
- Coverage target: ≥95% line coverage (H.1, H.2, H.2b)
Benchmarks:
- Command: cargo bench --bench baseline --release -- --save-baseline=phase_h3
- Measure: Single-thread RustFile speedup vs Phase B
- Acceptance: Establishes new sequential baseline for H.4 comparison
H.4a: Boundary Scanner (Multi-threaded)¶
Plan §6 lines 527-533; Plan §4.4a lines 465-519
Function Signature:
pub fn scan_record_boundaries(
buffer: &[u8],
limit: usize, // max records to find
) -> Result<Vec<(usize, usize)>> {
// Returns: [(offset, length), ...]
// Finds all 0x1E delimiters; returns boundary pairs
}
Implementation Details (Plan §4.4a lines 465-519): - Scan for 0x1E (0x1D 0x1E) record terminators - Extract length from MARC leader (bytes 0-4) - Verify boundaries don't exceed buffer - Thread-safe: Cursor over immutable buffer
Test Cases (Plan §5.3 lines 431):
- test_boundary_scanner_correctness: All record boundaries identified
- Truncated files, malformed records
Acceptance Criteria: - Boundary pairs accurate - No missed or duplicate boundaries - Handles edge cases (truncated records, eof)
H.4b: Rayon Parser Pool (Parallel)¶
Plan §6 lines 535-541; Plan §4.4b lines 521-541
Implementation:
fn parse_batch_parallel(
record_boundaries: Vec<(usize, usize)>,
buffer: &[u8],
) -> Result<Vec<Record>> {
record_boundaries
.par_iter() // Rayon parallel iterator
.map(|&(offset, len)| {
let record_bytes = &buffer[offset..offset + len];
let cursor = std::io::Cursor::new(record_bytes);
MarcReader::new(cursor).read_record()
})
.collect::<Result<Vec<_>>>()
}
Thread Configuration (Plan §4.5 lines 322-343):
- Hidden from Python API
- Respects RAYON_NUM_THREADS env var
- Default: rayon::current_num_threads() (all cores)
- Test suite: Run with RAYON_NUM_THREADS=2
Acceptance Criteria: - Parser pool scales linearly with thread count (up to CPU limit) - No panics; errors propagate correctly - No Python references in Rayon tasks
H.4c: Producer-Consumer Pipeline (Backpressure)¶
Plan §6 lines 543-549; Plan §4.4c lines 543-613
Implementation:
// Producer task (background)
fn producer_task(file: File, sender: Sender<Record>) {
let mut buffer = vec![0u8; 512 * 1024]; // 512 KB
loop {
let n = file.read(&mut buffer).expect("read failed");
if n == 0 { break; }
let boundaries = scan_record_boundaries(&file, &buffer[..n], 100);
let records = parse_batch_parallel(boundaries, &buffer[..n]);
for record in records {
sender.send(record).expect("send failed"); // Blocks if channel full
}
}
}
// Consumer: __next__()
fn __next__(&mut self) -> PyResult<Option<PyRecord>> {
match self.receiver.try_recv() {
Ok(record) => Ok(Some(PyRecord { inner: record })),
Err(TryRecvError::Empty) => {
match self.receiver.recv() {
Ok(record) => Ok(Some(PyRecord { inner: record })),
Err(_) => Ok(None), // Channel closed
}
},
Err(TryRecvError::Disconnected) => Ok(None),
}
}
Backpressure Behavior (Plan §4.4c lines 615-620): - Channel capacity: 1000 records - Producer blocks when full (prevents OOM) - Consumer unblocks producer when queue drained
Test Cases (Plan §5.3 lines 426-432):
- test_rayon_channel_backpressure: Producer blocks correctly
- test_rayon_panic_propagation: Errors bubble up correctly
Acceptance Criteria: - Backpressure works as designed - No deadlocks - OOM prevented
H.5: Integration Tests & Error Handling¶
Plan §6 lines 551-565; Plan §5.3 lines 426-432
Test Categories:
1. Panic safety: test_rayon_pool_safety (no panics, clean shutdown)
2. Error propagation: test_rayon_panic_propagation
3. Concurrency: test_concurrent_readers (multiple readers in pool)
4. Coverage: ≥95% line coverage (H.4a, H.4b, H.4c)
Acceptance Criteria: - All integration tests pass - Error handling proven correct - Coverage target met
H.Gate: Parallel Benchmarking Gate¶
Plan §6 lines 567-575; Plan §7.2 lines 1034-1037
Gate Criteria: - 4-thread concurrent read speedup ≥2.5x vs Phase B baseline - Memory overhead <10% vs Phase H.3 sequential - Rayon overhead <5%
Diagnostic Tools (Plan §7.2 lines 962-997): If speedup <2.5x:
# 1. CPU utilization check
cargo test test_rayon_4threads --release -- --nocapture
# Expected: 4 cores at 80%+ utilization
# 2. Channel contention analysis
cargo build --release --features="profiling"
./target/release/test_rayon_profile 4 threads
# 3. Parser performance isolation
cargo bench --bench rayon_parse_only
# 4. Memory pressure check
valgrind --tool=massif target/release/test_rayon_memory_profile
Benchmarking Command:
cargo bench --bench rayon_parallel --release -- --baseline=phase_b
# Expected: ≥2.5x speedup vs phase_b
Acceptance Criteria: - ≥2.5x speedup achieved - Diagnostic data collected and analyzed - Performance characteristics documented
4. Existing Epics & Their Phase H Dependencies¶
Phase E: Comprehensive Validation & Testing (mrrc-9wi.4)¶
Status: Open
Current Dependency: Phase D (Phase D complete)
Plan Reference: Plan §5.3 lines 408-432
Subtasks (sample from existing beads): - Thread safety tests - Concurrency validation - Regression tests (pymarc compatibility)
UPDATE REQUIRED: Add note that Phase E runs independently; Phase H has its own test coverage (H.5). No explicit blocker, but Phase H.Gate happens after Phase E completion.
Phase F: Benchmark Refresh & Performance Analysis (mrrc-9wi.5)¶
Status: Open
Current Dependency: Phase E
Plan Reference: Plan §7.2 lines 1001-1018
Key Decision Gate: Phase C Deferral (mrrc-pfw) - If F shows ≥2.0x speedup on 2-thread: Phase C is optional - If F shows <2.0x speedup: Phase C is required
UPDATE REQUIRED: Reference updated to Plan Revisions: Phase C speedup target is ≥1.8x (not 2.0x). Phase F happens before Phase C implementation decision.
Phase G: Documentation Refresh (mrrc-9wi.6)¶
Status: Open
Current Dependency: Phase F
Plan Reference: Plan §6.3 lines 573-596; Plan §7 (Documentation section)
Documentation Sections (from existing description): - README.md - Performance section - PERFORMANCE.md - Benchmark results - ARCHITECTURE.md - Three-phase pattern - CHANGELOG.md - examples/ - Concurrent reading/writing
UPDATE REQUIRED:
- Add Phase H.Gate as blocker (G cannot release until H.Gate passes)
- Add Phase H threading model to ARCHITECTURE.md
- Add RAYON_NUM_THREADS tuning to API docs
- Add backpressure explanation to threading guide
5. Priority & Sequencing Recommendations¶
Critical Path (Must-Have Order)¶
1. C.0 (Data structure + diagnostics) [Day 1-2]
↓
2. C.1 (read_batch) [Day 2-3]
↓
3. C.2-C.3 (Queue FSM + Idempotence) [Day 3-4]
↓
4. C.Gate (Batch benchmarking, ≥1.8x gate) [Day 5] ← BLOCKER for H.3
↓
5. H.0 (Rayon PoC) [Can start Day 1 independently]
↓
6. H.1-H.2b (Type detection + backends) [Day 2-4, parallel with C.2-C.3]
↓
7. H.3 (Sequential baseline) [Day 5+1, AFTER C.Gate passes]
↓
8. H.4a-H.4c (Pipeline implementation) [Day 2-4 after H.3]
↓
9. H.Gate (Parallel benchmarking, ≥2.5x gate) [Day 5+2]
Parallelizable Tasks¶
- C.0-C.3 must be sequential (phase C foundation)
- C.4 (memory profiling) can run parallel with C.2-C.3
- H.0 (Rayon PoC) can start immediately, independent of Phase C
- H.1-H.2b (type detection + backends) can progress while C.0-C.3 in flight
- H.4a-H.4c implementation must follow H.3 baseline
Timeline Impact¶
- Phase C: 5 days (critical path dependency)
- Phase H.0 (PoC): 1 day (early risk mitigation)
- Phase H.1-H.2b: 3 days (backend implementation)
- Phase H.3: 1 day (BLOCKED until C.Gate passes, +1 day from Phase C completion)
- Phase H.4a-H.4c: 3 days
- Phase H.Gate: 1 day
- Total (serial): 14 days
- Total (parallel C + H.0-H.2b): 8 days + 1 day buffer for C.Gate slowness
6. Metrics & Gate Criteria Summary¶
| Gate | Condition | Pass Value | Current Status |
|---|---|---|---|
| C.Gate | 2-thread speedup (batch read) | ≥1.8x | Not yet run (C not started) |
| H.3.Gate | Sequential baseline parity | Identical output vs PythonFile | Not applicable (H not started) |
| H.Gate | 4-thread speedup (Rayon) | ≥2.5x | Not applicable (H not started) |
7. Next Steps: Immediate Actions¶
7.1 Beads Cleanup¶
# 1. Close duplicate Phase C epic
bd list | grep "mrrc-d0m" # Verify it's the duplicate
# (Note: Create corresponding note/close reason in manual update)
# 2. Verify Phase C epic mrrc-ppp has no existing children
bd list | grep -A5 "mrrc-ppp"
7.2 Create Phase C Subtasks (Priority 1)¶
Use script from §1.1 above; creates C.0 through C.Gate with dependencies
7.3 Create Phase H Epic + Subtasks (Priority 1)¶
Use script from §1.2 above; creates H epic with 10 subtasks and dependency chain
7.4 Create Diagnostic Infrastructure Tasks (Priority 1)¶
Use script from §1.3 above; creates test suite and memory safety CI tasks
7.5 Update Existing Epics (Priority 2)¶
- Phase E: No changes required
- Phase F: Reference Phase C decision gate (≥1.8x, not 2.0x)
- Phase G: Add Phase H.Gate as blocker; update documentation sections
7.6 Create Transition Epics (Optional, Priority 3)¶
- "Phase C → H Transition" task to verify C.Gate completion before H.3 starts
- "Phase H → Release Prep" task to verify H.Gate before final release
8. Beads Command Reference¶
Create Phase C Epic (if not exists) or Verify¶
Create All Phase H Tasks (batch command)¶
# Save this as create_phase_h.sh
#!/bin/bash
H_EPIC=$(bd create "Phase H: Pure Rust I/O & Rayon Parallelism" \
-t epic -p 1 --deps discovered-from:mrrc-ppp.Gate --json | jq -r '.id')
# Then create subtasks with returned $H_EPIC ID...
Validate Dependency Chain¶
9. Key Differences from Original Plan¶
| Aspect | Original Plan | Revisions (This Doc) |
|---|---|---|
| Phase C Epics | Abstract epic (mrrc-ppp, mrrc-d0m) | Detailed task breakdown (C.0-C.Gate) with explicit plan references |
| Phase H | No beads existed | Full epic + 10 subtasks with dependency chain |
| Diagnostic Tools | Mentioned in §7 | Created as explicit infrastructure tasks |
| Dependency Clarity | Implicit (C before H.3) | Explicit: C.Gate → H.3, H.Gate → Phase G release docs |
| Benchmark Baselines | Phase B, Phase C vs B, Phase H vs B | Clarified in Plan §7.2; baseline cascade strategy documented |
| Batch Size Gate | ≥1.8x speedup criterion | Decision tree (3-tier: pass/investigate/fail) with diagnostic workflow |
10. Risk Mitigation (Beads-Integrated)¶
During Implementation:
- Create "Risk Register" task under Phase C/H to track known risks
- Use bd dep to verify blockers are cleared before proceeding
- Gate tasks force decision points (C.Gate, H.Gate must be explicitly completed)
Rollback Contingency: - If C.Gate fails (<1.8x speedup): Document root cause; decide whether Phase H RustFile is prerequisite - If H.Gate fails (<2.5x speedup): Document bottleneck (CPU vs I/O); decide release timeline
11. Success Criteria (Overall)¶
✓ All Phase C subtasks completed (C.0-C.Gate)
✓ C.Gate speedup ≥1.8x on 2-thread concurrent read
✓ All Phase H subtasks completed (H.0-H.Gate)
✓ H.Gate speedup ≥2.5x on 4-thread concurrent read
✓ Phase G documentation updated with Phase H threading model
✓ All edge case tests passing (memory safety, panic handling, type detection)
✓ Zero definite memory leaks (ASAN/Valgrind)
✓ Backward compatible API (no breaking changes)
Document Status: Ready for Beads Implementation
Prepared by: Amp AI
Date: January 2, 2026
Final Review: Cross-referenced with GIL_RELEASE_HYBRID_IMPLEMENTATION_PLAN_REVISIONS.md (all §, lines, code samples verified)