Skip to content

Pure Rust Single-Threaded Performance Profile Summary

Objective: Within-mode bottleneck analysis for pure Rust sequential implementation
Date: 2026-01-08
Status: Complete


Document Purpose
PROFILING_PLAN.md Methodology and tools
RUST_SINGLE_THREADED_PROFILING_RESULTS.md Detailed baseline results
PHASE_2_DETAILED_ANALYSIS.md CPU intensity and memory analysis

Executive Summary

Pure Rust (mrrc) single-threaded implementation shows excellent performance with well-understood bottlenecks.

Performance Baseline: Excellent ✓

  • Throughput: 1.06M rec/s (0.94 µs/record latency)
  • Field access overhead: 2-5% minimal
  • Consistency: Linear scaling across 1k-10k record files
  • Classification: CPU-bound, not I/O or memory-bound

No Algorithmic Bottlenecks Found

  • Parsing logic is efficient
  • nom parser performs well for MARC format
  • Field lookups are fast
  • Record structure is sound

Memory Efficiency: Improvable ⚠️

  • Heap usage: 39.4 MB for 10k records
  • Allocation overhead: 73% of heap is metadata
  • Primary issue: String headers (24B each) for fixed-size data
  • Opportunity: Compact encoding for tags and indicators

Performance Analysis

Phase 1: Baseline Measurements

Benchmark Results

Operation Throughput Latency Overhead
Read only (1k) 1.06M rec/s 0.94 µs baseline
Read only (10k) 1.06M rec/s 0.94 µs +0.2%
Parse + field access (1k) 1.03M rec/s 0.96 µs +4.6%
Parse + field access (10k) 1.04M rec/s 0.96 µs +2.6%
Parse + JSON serialize 299k rec/s 3.3 ms +254% (expected)
Parse + XML serialize 249k rec/s 4.0 ms +326% (expected)

Interpretation: - Read operation alone: 1.06M rec/s (excellent) - Field access adds minimal overhead (2-5%) - Serialization is expensive, but expected (not a parsing issue) - Linear scaling indicates predictable performance


Phase 2: CPU and Memory Analysis

CPU Intensity

  • Estimated cycles per record: 3,340 (at 3 GHz)
  • Classification: Compute-bound
  • Implication: Should parallelize well to multiple cores

Memory Allocation Hotspots (per 10k records)

Allocation Type Count Total Avg % of Heap
Subfield data Strings 500k 25 MB 50 B 63%
Field Vec 10k 6.4 MB 640 B 16%
Tag Strings 200k 600 KB 3 B 1.5%
Indicator Strings 200k 400 KB 2 B 1%
Other overhead 7 MB 18%
TOTAL 910k 39.4 MB 100%

Memory Inefficiency Breakdown

Per-Record Memory Usage

Actual data content:      ~2.6 KB
String headers (24B×70):  ~1.7 KB
Vec overhead:             ~0.3 KB
Other pointers/metadata:  ~0.7 KB
─────────────────────────────────
Total per record:         ~5.3 KB (ideal)
                          ~10.2 KB (actual, with overhead)

Key Issue: String Header Overhead

Every string uses 24-byte header, including fixed-size data:

Current approach:
  Tag "245"        → String { ptr, len, cap } + "245" = 27 bytes
  Indicator "10"   → String { ptr, len, cap } + "10"  = 26 bytes

Optimal approach:
  Tag "245"        → u16 (tag number) = 2 bytes
  Indicator "10"   → [u8; 2] = 2 bytes


Identified Bottlenecks

Bottleneck #1: Memory Allocation Metadata (High Impact)

  • Cost: 73% of heap is metadata, only 27% is actual data
  • Root cause: String headers for small, fixed-size data (tags, indicators)
  • Type: Memory efficiency, not performance-critical
  • Opportunity: Encode tags as u16, indicators as [u8; 2]

Bottleneck #2: Vec Allocations for Fields (Medium Impact)

  • Cost: 16% of heap for field Vecs
  • Root cause: Every record allocates Vec, even for typical 20-field records
  • Opportunity: Use SmallVec<[Field; 20]> to avoid heap for common case

Bottleneck #3: Serialization Cost (Expected, Low Priority)

  • Cost: +254-326% overhead for JSON/XML
  • Root cause: serde_json and quick-xml are general-purpose serializers
  • Note: This is expected, not a parsing bottleneck
  • Resolution: Users should avoid serializing every record if possible

What Limits Performance in This Mode?

In pure Rust single-threaded: 1. CPU throughput - The algorithm uses 3,340 cycles per record, which is reasonable for MARC parsing 2. Memory bandwidth - Reading 264-byte records from file and heap 3. Serialization - If outputting to JSON/XML (avoidable)

What does NOT limit performance: - I/O (buffering is adequate) - Parsing logic (nom is efficient) - Field access (minimal overhead) - Allocation frequency (expected patterns)


Notes on File Sizes

Results are consistent across: - 1k records (257 KB) - 10k records (2.5 MB)

Scaling is linear, suggesting: - No pathological behavior at larger scales - Buffer cache hits for smaller files - Consistent algorithm performance


Recommendations for Optimization

For optimization proposals based on these profiling results, see docs/design/OPTIMIZATION_PROPOSAL.md.

Key findings that inform optimization decisions: - Memory overhead is fixable (73% metadata) - CPU efficiency is excellent (no algorithmic bottlenecks) - Concurrency gap in Rust likely comes from work distribution, not parsing - SmallVec and compact tag encoding have low risk and high ROI


Confidence Levels

Finding Confidence Basis
Throughput is 1.06M rec/s Very High Criterion.rs consistent measurements
No algorithm bottleneck Very High Detailed analysis + CPU profiling
Memory overhead is 73% metadata High Allocation analysis
Field access overhead is 2-5% High Consistent measurements

References

  • Profiling methodology: docs/design/profiling/PROFILING_PLAN.md
  • Detailed results: docs/design/profiling/RUST_SINGLE_THREADED_PROFILING_RESULTS.md
  • CPU/memory details: docs/design/profiling/PHASE_2_DETAILED_ANALYSIS.md
  • Optimization proposals: docs/design/OPTIMIZATION_PROPOSAL.md
  • Related issues: mrrc-u33.2, mrrc-u33.1