Updated: 2026-01-19
Status: 8/10 formats complete; 7 recommended
Framework: See EVALUATION_FRAMEWORK.md
This document aggregates results from all format evaluations for side-by-side comparison.
Executive Summary
| Dimension |
Winner |
Notes |
| Fidelity |
Protobuf, FlatBuffers, ISO 2709, Arrow Analytics |
All achieve 100% perfect round-trip |
| Read Speed (General) |
ISO 2709 |
3.5-9x faster than alternatives |
| Read Speed (Analytics) |
Arrow Analytics (Rust) |
1.77M rec/sec; 1.96× faster than ISO 2709 baseline |
| Compression |
FlatBuffers, Arrow Analytics (Parquet) |
FlatBuffers 98.08%; Arrow Parquet 30% smaller than ISO 2709 |
| Memory Efficiency (API) |
FlatBuffers |
64% lower peak memory than baseline |
| Memory Efficiency (Analytics) |
Arrow Analytics |
Columnar zero-copy; sub-microsecond column access |
| API Maturity |
Protobuf |
Widest ecosystem and language support |
| Zero-Copy |
FlatBuffers, Arrow Analytics |
FlatBuffers for serialization; Arrow for in-memory analysis |
| Schema Evolution |
Protobuf, Avro |
Full bi-directional compatibility |
| Analytics Performance |
Arrow Analytics (Rust-native) |
1.77M rec/sec; optimal for MARC discovery optimization |
Comprehensive Comparison Matrix
| Format |
Fidelity |
Read (rec/sec) |
Write (rec/sec) |
Roundtrip (rec/sec) |
vs ISO 2709 Read |
vs ISO 2709 Write |
vs ISO 2709 Roundtrip |
| ISO 2709 |
✅ 100% |
903,560 |
~789,405 |
421,556 |
Baseline |
Baseline |
Baseline |
| Protobuf |
✅ 100% |
100,000 |
100,000 |
~90,000 |
-88.9% |
-87.3% |
-78.6% |
| FlatBuffers |
✅ 100% |
259,240 |
108,932 |
69,767 |
-71.3% |
-86.2% |
-83.5% |
| Parquet |
✅ 100% |
729,002 |
767,889 |
430,000 |
-19.3% |
-2.7% |
+2.0% |
| Arrow |
✅ 100% |
865,331 |
712,407 |
405,000 |
-4.2% |
-9.8% |
-4.0% |
| MessagePack |
✅ 100% |
750,434 |
746,410 |
415,000 |
-17.0% |
-5.4% |
-1.5% |
| CBOR |
✅ 100% |
496,186 |
615,571 |
338,000 |
-45.1% |
-21.9% |
-19.8% |
| Avro |
✅ 100% |
215,887 |
338,228 |
184,000 |
-76.1% |
-57.1% |
-56.4% |
File Size Metrics
| Format |
Raw Size |
vs ISO 2709 (raw) |
Gzip Size |
vs ISO 2709 (gzip) |
Compression Ratio |
| ISO 2709 |
2,645,353 |
Baseline |
85,288 |
Baseline |
96.77% |
| Protobuf |
7,500,000–8,500,000 |
+184%–222% |
1,200,000–1,500,000 |
+1,306%–1,656% |
84–85% |
| FlatBuffers |
6,703,891 |
+153% |
129,045 |
+51% |
98.08% |
| Parquet |
4,609,288 |
+74% |
~80,000 |
~–6% |
98.3% (Snappy) |
| Arrow |
1,847,294 |
–30% |
74,156 |
–13% |
95.99% |
| MessagePack |
1,993,352 |
–25% |
83,747 |
–2% |
95.80% |
| CBOR |
4,800,701 |
+82% |
100,090 |
+17% |
97.60% |
| Avro |
6,291,376 |
+138% |
106,314 |
+25% |
98.31% |
Memory Efficiency
| Format |
Peak Memory |
vs Baseline |
Notes |
| Baseline ISO 2709 |
~45 MB |
Baseline |
Reference point |
| Protobuf |
~45–50 MB |
+0–11% |
Schema serialization overhead |
| FlatBuffers |
~16 MB |
–64% |
Streaming model; zero-copy capable |
| Parquet |
~30–35 MB |
–22–28% |
Arrow columnar; efficient denormalization |
| Arrow |
~30–35 MB |
–22–28% |
In-memory columnar; efficient representation |
| MessagePack |
~40–45 MB |
–0–11% |
Streaming model; serde overhead comparable to baseline |
| CBOR |
~40–45 MB |
–0–11% |
Serde overhead; comparable to MessagePack |
| Avro |
~45–50 MB |
+0–11% |
JSON serialization adds overhead |
Integration & Ecosystem Assessment
Rust Dependencies
| Format |
Direct Deps |
Transitive Deps |
Maturity |
Maintenance |
Notes |
| ISO 2709 |
0 |
0 |
⭐⭐⭐⭐⭐ |
mrrc native |
Built-in; no external deps |
| Protobuf |
2 |
Low |
⭐⭐⭐⭐⭐ |
Active |
Google-backed; widely adopted |
| FlatBuffers |
2 |
Low |
⭐⭐⭐⭐⭐ |
Active |
Google-backed; production-ready |
| Parquet |
0 |
0 |
⭐⭐⭐ |
mrrc custom |
Arrow-based impl; no new deps |
| Arrow |
1 |
Low |
⭐⭐⭐⭐⭐ |
Active |
Apache-backed; production-ready |
| MessagePack |
2 |
Low |
⭐⭐⭐⭐⭐ |
Active |
Schema-less; stable ecosystem |
| CBOR |
2 |
Low |
⭐⭐⭐⭐ |
Active |
RFC 7949 standard; proven |
| Avro |
1 |
Low |
⭐⭐⭐⭐⭐ |
Active |
Apache-backed; schema evolution |
Language Support
| Format |
Rust |
Python |
Java |
Go |
C++ |
Notes |
| ISO 2709 |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
⭐⭐ |
⭐⭐ |
⭐⭐ |
mrrc native; others custom |
| Protobuf |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
Official support all languages |
| FlatBuffers |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
Official support; good ecosystem |
| Parquet |
⭐⭐⭐⭐ |
⭐⭐ |
⭐⭐⭐⭐ |
⭐⭐⭐⭐ |
⭐⭐⭐⭐ |
Arrow-based impl |
| Arrow |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
Apache-backed; excellent support |
| MessagePack |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐ |
Universal 50+ language support |
| CBOR |
⭐⭐⭐⭐ |
⭐⭐⭐⭐ |
⭐⭐⭐⭐ |
⭐⭐⭐⭐ |
⭐⭐⭐⭐ |
RFC 7949 standard; excellent support |
| Avro |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
⭐⭐⭐⭐ |
Apache-backed; excellent ecosystem |
Schema Evolution Capability
| Format |
Score (1–5) |
Forward Compat |
Backward Compat |
Append-Only |
Notes |
| ISO 2709 |
1 |
❌ |
❌ |
❌ |
Fixed binary format; no evolution |
| Protobuf |
5 |
✅ |
✅ |
✅ |
Bi-directional; full compatibility |
| FlatBuffers |
4 |
⚠️ |
✅ |
✅ |
Append-only evolution constraint |
| Parquet |
2 |
⚠️ |
⚠️ |
❌ |
Limited field evolution |
| Arrow |
4 |
✅ |
✅ |
✅ |
Flexible schema; column addition/renaming |
| MessagePack |
2 |
⚠️ |
✅ |
✅ |
Schema-less; new optional fields compatible |
| CBOR |
3 |
✅ |
✅ |
✅ |
Semantic tags enable version metadata |
| Avro |
5 |
✅ |
✅ |
✅ |
Best-in-class schema evolution |
Error Handling & Robustness
| Format |
Graceful Errors |
No Panics |
Silent Corruption |
Notes |
| Protobuf |
✅ |
✅ |
✅ None |
Comprehensive error handling |
| FlatBuffers |
✅ |
✅ |
✅ None |
Comprehensive error handling |
| Parquet |
✅ |
✅ |
✅ None |
Arrow serialization is robust |
| Arrow |
✅ |
✅ |
✅ None |
Arrow validation is thorough |
| MessagePack |
✅ |
✅ |
✅ None |
rmp-serde validates all invalid input |
| CBOR |
✅ |
✅ |
✅ None |
ciborium validates CBOR strictly |
| Avro |
✅ |
✅ |
✅ None |
Schema validation + field constraints |
Use Case Fit Scoring (1-5 scale)
Binary Interchange (API/gRPC)
| Format |
Score |
Notes |
| Protobuf |
⭐⭐⭐⭐⭐ |
Native gRPC support; schema contracts; cross-language |
| FlatBuffers |
⭐⭐⭐⭐ |
Fast deserialization; zero-copy potential; good for REST |
| MessagePack |
⭐⭐⭐ |
Simple; compact; limited schema validation |
| ISO 2709 |
⭐⭐ |
Legacy standard; not designed for modern APIs |
| Format |
Score |
Notes |
| ISO 2709 |
⭐⭐⭐⭐⭐ |
900k+ rec/sec; highly optimized |
| Arrow |
⭐⭐⭐⭐ |
865k rec/sec; negligible overhead vs ISO 2709 |
| MessagePack |
⭐⭐⭐ |
Compact; reasonable throughput |
| Parquet (Arrow) |
⭐⭐⭐ |
729k rec/sec; columnar but denormalized design |
| Protobuf |
⭐⭐ |
100k rec/sec; acceptable but not ideal |
| FlatBuffers |
⭐⭐ |
259k rec/sec; memory efficient but slow |
Analytics & Big Data (Spark/Hadoop)
| Format |
Score |
Notes |
| Parquet |
⭐⭐⭐⭐⭐ |
Purpose-built for columnar analytics |
| Arrow |
⭐⭐⭐⭐⭐ |
In-memory columnar; zero-copy; GPU support |
| Avro |
⭐⭐⭐ |
Hadoop ecosystem integration; schema registry |
| All binary row formats |
⭐ |
Not suitable for columnar analytics |
Memory-Constrained Environments
| Format |
Score |
Notes |
| FlatBuffers |
⭐⭐⭐⭐⭐ |
64% lower memory; zero-copy; perfect for mobile |
| Arrow |
⭐⭐⭐⭐ |
In-memory columnar; efficient representation |
| MessagePack |
⭐⭐⭐⭐ |
Very compact serialization |
| Protobuf |
⭐⭐⭐ |
Reasonable; not specifically optimized |
| ISO 2709 |
⭐⭐ |
No built-in memory optimization |
Long-Term Archival (10+ years)
| Format |
Score |
Notes |
| ISO 2709 |
⭐⭐⭐⭐ |
Proven 50+ year track record; stable standard |
| Protobuf |
⭐⭐⭐⭐ |
Schema versioning; forward/backward compatible |
| Avro |
⭐⭐⭐⭐ |
Self-describing; schema registry support |
| Arrow |
⭐⭐⭐ |
Ecosystem growing; 30% file size advantage valuable |
| FlatBuffers |
⭐⭐⭐ |
Append-only evolution; less flexible |
| Parquet |
⭐⭐ |
File size overhead (3.8×) not justified |
| CBOR |
⭐⭐⭐ |
Standards-based; less mature than others |
Analytics Tier: Arrow (Rust-Native) as Specialized MARC Workload
Purpose & Architecture
Arrow Analytics is NOT a replacement for general binary formats, but a specialized columnar analytics tier for MARC data. It complements ISO 2709 by providing:
- Column-oriented analysis (field frequency, subject distribution, cataloging metrics)
- SQL-based discovery optimization via Arrow IPC format integration with DuckDB, Polars
- Zero-copy in-memory operations for bulk analytical queries
- Parquet archival for long-term analytical snapshots
| Metric |
Value |
Notes |
| Throughput (Rust) |
1.77M rec/sec |
Deserialization speed (5.7 ms for 10k records); 1.96× faster than ISO 2709 baseline |
| Round-Trip Fidelity |
100% |
All 100 test records achieve perfect preservation (field/subfield ordering, indicators, UTF-8 content) |
| Column Access Latency |
Sub-microsecond |
Per-column iteration via Arrow native arrays (zero-copy) |
| In-Memory Size (10k records) |
~5.8 MB |
Columnar normalization to long format (62,667 rows); overhead vs 2.6 MB ISO 2709 |
| Parquet Compression |
1.8 MB |
30% smaller than ISO 2709 (2.6 MB raw); recommended for analytical archives |
| Subject Analysis (650 fields) |
45 ms |
SQL GROUP BY across 10k records via DuckDB |
| Authority Reconciliation |
18 ms |
Efficient joins on 700/710/711 name fields |
| Date Range Filtering (008) |
22 ms |
Field-level extraction and filtering |
Use Cases & Scoring
| Use Case |
Score |
When to Use |
| Subject/authority frequency analysis |
⭐⭐⭐⭐⭐ |
SQL GROUP BY queries on repeating fields; fast aggregation |
| Discovery index optimization |
⭐⭐⭐⭐⭐ |
Field presence/absence analysis, cataloging workflows |
| Bulk field transformation |
⭐⭐⭐ |
Polars good for column-level ops; less ideal for complex record restructuring |
| Interactive single-record access |
⭐ |
80 ms latency unacceptable for APIs; use ISO 2709 cache instead |
| Streaming large files |
⭐ |
Requires full materialization; ISO 2709 streaming more efficient |
Integration Approach
| Layer |
Technology |
Rationale |
| Rust Core |
arrow-rs crate (v57.0) |
Native Arrow implementation; zero unsafe code; 1.77M rec/sec |
| Persistence |
Parquet format |
Long-term analytical archives; 30% compression vs ISO 2709 |
| Interoperability |
Arrow IPC format |
Readable by DuckDB, Polars, DataFusion (user responsibility for external queries) |
| SQL Analytics |
DuckDB (optional) |
External tool; read Arrow IPC format for SQL-based MARC analysis |
| Python Analytics |
Polars + PyArrow (optional) |
Via PyO3 FFI; data scientists can perform additional analysis on Arrow IPC exports |
| Aspect |
General Arrow |
Analytics Arrow |
| Purpose |
API serialization, ecosystem interchange |
MARC-specific column analysis |
| Schema |
Flexible application schema |
Normalized long format (record_id, field_tag, indicators, subfield_code, subfield_value) |
| Serialization |
Standard Arrow IPC/columnar |
Rust-native builders; 100% MARC fidelity preservation |
| Deployment |
External tools manage data flow |
Integrated into mrrc library; no external dependencies |
| Performance Target |
Balanced read/write |
Optimized for analytical throughput (1.77M rec/sec) |
| Persistence |
Flexible (Parquet, Arrow IPC) |
Parquet for archival; Arrow IPC for external tool integration |
Recommendation Context
✅ RECOMMENDED for organizations doing heavy MARC analytics (discovery optimization, cataloging metrics, authority reconciliation).
⚠️ NOT RECOMMENDED as primary MARC import/export format. Continue using ISO 2709 for general purposes.
Implementation tier: Medium priority (after core MARC format support is stable). Add as opt-in feature in mrrc library (Phase 2+).
Implementation Complexity
| Format |
LOC (Rust) |
Dev Time |
Maintenance |
Build Impact |
| FlatBuffers |
214 |
1-2 hrs |
Low (minimal schema) |
<5s full build |
| Protobuf |
415 |
2-3 hrs |
Low (code-gen) |
+5.88s full build |
| Parquet |
290 |
~2 hrs |
Low (JSON serialization) |
Negligible (<1s) |
| Arrow |
410 |
~3 hrs |
Low (Arrow library) |
+2-3s full build |
| MessagePack |
~150 |
1-2 days |
Very Low (serde handles) |
+1s incremental |
| CBOR |
~150 |
1-2 days |
Very Low (ciborium handles) |
+1s incremental |
| Avro |
~300 |
2-3 hrs |
Low (schema management) |
+2-3s incremental |
Summary Recommendations
| Format |
Primary Use Cases |
Rationale |
| Protobuf |
API serialization, REST/gRPC, cross-language interchange |
Mature ecosystem, schema evolution, excellent Rust support |
| FlatBuffers |
Streaming APIs, memory-constrained environments, zero-copy scenarios |
Memory efficient, fast deserialization, Apple production use |
| Arrow (Columnar) |
In-memory analytics, ecosystem integration (Polars/DuckDB), analytics interchange |
Industry-standard, 30% file size advantage, minimal performance overhead, excellent tooling |
| ISO 2709 |
High-throughput batch processing, archival baseline |
Proven standard, maximum performance, no dependencies |
| MessagePack |
Compact storage, inter-process communication, REST payloads |
25% file size reduction, 750K rec/sec throughput, universal language support |
| CBOR |
Standards-based archival, government/academic systems, preservation institutions |
RFC 7949 standard, diagnostic notation, semantic tagging, 45% size reduction after gzip |
| Avro |
Event streaming (Kafka), data lake integration, multi-language systems |
Best-in-class schema evolution, self-describing format, ecosystem integration |
✅ RECOMMENDED (Specialized Analytics Tier)
| Format |
Primary Use Cases |
Rationale |
| Arrow Analytics (Rust-native) |
MARC discovery analytics, subject/authority frequency analysis, cataloging workflow optimization, bulk field filtering |
1.77M rec/sec (1.96× faster than ISO 2709), 100% round-trip fidelity, zero-copy sub-microsecond column access, Parquet archival with 30% compression; NOT a general-purpose format; complements ISO 2709 |
⚠️ CONDITIONAL
(All recommended formats above are production-ready; no conditional recommendations at this time)
⚠️ CONDITIONAL (Special Use Cases Only)
| Format |
Use Case |
Condition |
| Parquet (Arrow columnar) |
Analytics on very large collections (100M+ records) |
Only recommended when columnar benefits (selective column access) justify 74% size overhead |
Evaluation Status
| Format |
Issue |
Status |
Date |
Fidelity |
Recommended |
| ISO 2709 |
Baseline |
✅ Complete |
2026-01-14 |
100% |
✅ Yes |
| Protobuf |
mrrc-fks.1 |
✅ Complete |
2026-01-14 |
100% |
✅ Yes |
| FlatBuffers |
mrrc-fks.2 |
✅ Complete |
2026-01-14 |
100% |
✅ Yes |
| Parquet |
mrrc-ks7 |
✅ Complete |
2026-01-17 |
100% |
❌ No |
| Arrow (Columnar) |
mrrc-fks.7 |
✅ Complete |
2026-01-15 |
100% |
✅ Yes |
| MessagePack |
mrrc-fks.5 |
✅ Complete |
2026-01-16 |
100% |
✅ Yes |
| CBOR |
mrrc-fks.6 |
✅ Complete |
2026-01-16 |
100% |
✅ Yes |
| Avro |
mrrc-fks.4 |
✅ Complete |
2026-01-16 |
100% |
⚠️ Conditional |
| Arrow Analytics (Rust-native) |
mrrc-fks.10 |
✅ Complete |
2026-01-17 |
100% |
✅ Yes* |
How to Update This Matrix
When completing a new format evaluation (e.g., mrrc-fks.3 for Parquet):
- Extract performance data from EVALUATION_PARQUET.md section 4
- Fill in metrics tables above (Read, Write, File Size, Memory)
- Update integration assessment (Dependencies, Language Support, Schema Evolution)
- Update use case fit scoring based on evaluation findings
- Add to evaluation status with issue ID and completion date
- Update summary recommendations if needed
Template for new row:
| **[Format]** | 100% ✅ | [read] | [write] | [roundtrip] | [delta %] |
Document History
| Date |
Version |
Changes |
| 2026-01-19 |
2.2 |
Added new "Analytics Tier: Arrow (Rust-Native)" section; integrated findings from EVALUATION_POLARS_ARROW_DUCKDB.md; clarified distinction between general-purpose formats and specialized analytics tier; split recommendations into two tables (general vs analytics); marked Arrow Analytics as separate entry in evaluation status |
| 2026-01-19 |
2.1 |
Filled in all performance numbers from evaluation docs; use consistent stars (ratings) vs checkmarks (pass/fail); removed mixed notation; improved readability |
| 2026-01-16 |
2.0 |
Added MessagePack, CBOR, Avro evaluations (7/10 formats complete); updated performance, file size, schema evolution, and recommendations |
| 2026-01-14 |
1.0 |
Initial comparison matrix with Protobuf and FlatBuffers complete; templates for remaining formats |