Pure Rust Single-Threaded Performance Profile Summary¶
Objective: Within-mode bottleneck analysis for pure Rust sequential implementation
Date: 2026-01-08
Status: Complete
Quick Links¶
| Document | Purpose |
|---|---|
| PROFILING_PLAN.md | Methodology and tools |
| RUST_SINGLE_THREADED_PROFILING_RESULTS.md | Detailed baseline results |
| PHASE_2_DETAILED_ANALYSIS.md | CPU intensity and memory analysis |
Executive Summary¶
Pure Rust (mrrc) single-threaded implementation shows excellent performance with well-understood bottlenecks.
Performance Baseline: Excellent ✓¶
- Throughput: 1.06M rec/s (0.94 µs/record latency)
- Field access overhead: 2-5% minimal
- Consistency: Linear scaling across 1k-10k record files
- Classification: CPU-bound, not I/O or memory-bound
No Algorithmic Bottlenecks Found¶
- Parsing logic is efficient
- nom parser performs well for MARC format
- Field lookups are fast
- Record structure is sound
Memory Efficiency: Improvable ⚠️¶
- Heap usage: 39.4 MB for 10k records
- Allocation overhead: 73% of heap is metadata
- Primary issue: String headers (24B each) for fixed-size data
- Opportunity: Compact encoding for tags and indicators
Performance Analysis¶
Phase 1: Baseline Measurements¶
Benchmark Results
| Operation | Throughput | Latency | Overhead |
|---|---|---|---|
| Read only (1k) | 1.06M rec/s | 0.94 µs | baseline |
| Read only (10k) | 1.06M rec/s | 0.94 µs | +0.2% |
| Parse + field access (1k) | 1.03M rec/s | 0.96 µs | +4.6% |
| Parse + field access (10k) | 1.04M rec/s | 0.96 µs | +2.6% |
| Parse + JSON serialize | 299k rec/s | 3.3 ms | +254% (expected) |
| Parse + XML serialize | 249k rec/s | 4.0 ms | +326% (expected) |
Interpretation: - Read operation alone: 1.06M rec/s (excellent) - Field access adds minimal overhead (2-5%) - Serialization is expensive, but expected (not a parsing issue) - Linear scaling indicates predictable performance
Phase 2: CPU and Memory Analysis¶
CPU Intensity¶
- Estimated cycles per record: 3,340 (at 3 GHz)
- Classification: Compute-bound
- Implication: Should parallelize well to multiple cores
Memory Allocation Hotspots (per 10k records)¶
| Allocation Type | Count | Total | Avg | % of Heap |
|---|---|---|---|---|
| Subfield data Strings | 500k | 25 MB | 50 B | 63% |
| Field Vec | 10k | 6.4 MB | 640 B | 16% |
| Tag Strings | 200k | 600 KB | 3 B | 1.5% |
| Indicator Strings | 200k | 400 KB | 2 B | 1% |
| Other overhead | — | 7 MB | — | 18% |
| TOTAL | 910k | 39.4 MB | — | 100% |
Memory Inefficiency Breakdown¶
Per-Record Memory Usage
Actual data content: ~2.6 KB
String headers (24B×70): ~1.7 KB
Vec overhead: ~0.3 KB
Other pointers/metadata: ~0.7 KB
─────────────────────────────────
Total per record: ~5.3 KB (ideal)
~10.2 KB (actual, with overhead)
Key Issue: String Header Overhead
Every string uses 24-byte header, including fixed-size data:
Current approach:
Tag "245" → String { ptr, len, cap } + "245" = 27 bytes
Indicator "10" → String { ptr, len, cap } + "10" = 26 bytes
Optimal approach:
Tag "245" → u16 (tag number) = 2 bytes
Indicator "10" → [u8; 2] = 2 bytes
Identified Bottlenecks¶
Bottleneck #1: Memory Allocation Metadata (High Impact)¶
- Cost: 73% of heap is metadata, only 27% is actual data
- Root cause: String headers for small, fixed-size data (tags, indicators)
- Type: Memory efficiency, not performance-critical
- Opportunity: Encode tags as u16, indicators as [u8; 2]
Bottleneck #2: Vec Allocations for Fields (Medium Impact)¶
- Cost: 16% of heap for field Vecs
- Root cause: Every record allocates Vec, even for typical 20-field records
- Opportunity: Use SmallVec<[Field; 20]> to avoid heap for common case
Bottleneck #3: Serialization Cost (Expected, Low Priority)¶
- Cost: +254-326% overhead for JSON/XML
- Root cause: serde_json and quick-xml are general-purpose serializers
- Note: This is expected, not a parsing bottleneck
- Resolution: Users should avoid serializing every record if possible
What Limits Performance in This Mode?¶
In pure Rust single-threaded: 1. CPU throughput - The algorithm uses 3,340 cycles per record, which is reasonable for MARC parsing 2. Memory bandwidth - Reading 264-byte records from file and heap 3. Serialization - If outputting to JSON/XML (avoidable)
What does NOT limit performance: - I/O (buffering is adequate) - Parsing logic (nom is efficient) - Field access (minimal overhead) - Allocation frequency (expected patterns)
Notes on File Sizes¶
Results are consistent across: - 1k records (257 KB) - 10k records (2.5 MB)
Scaling is linear, suggesting: - No pathological behavior at larger scales - Buffer cache hits for smaller files - Consistent algorithm performance
Recommendations for Optimization¶
For optimization proposals based on these profiling results, see docs/design/OPTIMIZATION_PROPOSAL.md.
Key findings that inform optimization decisions: - Memory overhead is fixable (73% metadata) - CPU efficiency is excellent (no algorithmic bottlenecks) - Concurrency gap in Rust likely comes from work distribution, not parsing - SmallVec and compact tag encoding have low risk and high ROI
Confidence Levels¶
| Finding | Confidence | Basis |
|---|---|---|
| Throughput is 1.06M rec/s | Very High | Criterion.rs consistent measurements |
| No algorithm bottleneck | Very High | Detailed analysis + CPU profiling |
| Memory overhead is 73% metadata | High | Allocation analysis |
| Field access overhead is 2-5% | High | Consistent measurements |
References¶
- Profiling methodology:
docs/design/profiling/PROFILING_PLAN.md - Detailed results:
docs/design/profiling/RUST_SINGLE_THREADED_PROFILING_RESULTS.md - CPU/memory details:
docs/design/profiling/PHASE_2_DETAILED_ANALYSIS.md - Optimization proposals:
docs/design/OPTIMIZATION_PROPOSAL.md - Related issues: mrrc-u33.2, mrrc-u33.1