Skip to content

Binary Format Evaluation Framework

This document establishes the standardized evaluation framework and methodology for assessing binary data formats as potential MARC import/export formats. All format-specific evaluations must follow this framework to ensure comparable, synthesizable results.

Overview

Objective: Evaluate modern binary formats for MARC data representation as potential Rust mrrc core implementations, measuring round-trip fidelity, Rust performance, and ecosystem fit against the ISO 2709 baseline.

Scope: Each format evaluation focuses on Rust implementation with secondary consideration for Python/multi-language support: - Primary: Rust implementation for mrrc core (via cargo dependency or internal implementation) - Secondary: Python convenience wrappers (via PyO3) for applicable formats - Tertiary: Broader language ecosystem (Java, Go, C++) noted but not required

Each format evaluation produces a structured report following the templates in this document.

Encoding & Normalization Assumptions

All evaluations operate on mrrc's internal MARC data model:

  • Input ISO 2709 (MARC) records MAY be in MARC-8 or UTF-8.
  • mrrc is responsible for decoding ISO 2709 and normalizing all record content to UTF-8 MarcRecord objects.
  • Candidate binary formats are evaluated only on their ability to faithfully represent and round-trip these UTF-8 MarcRecord objects.
  • Preservation of the original ISO 2709 byte encoding (e.g., MARC-8 escape sequences) is out of scope for binary format evaluation — that's tested separately in mrrc's import/export layer.

All fidelity comparisons in this framework are defined on the normalized MARC data model (fields, indicators, subfields, UTF-8 strings), not on raw ISO 2709 bytes.


1. Schema Design Requirements

Every format evaluation must include a complete schema design addressing:

1.1 Required MARC Components

Component Description Schema Requirement
Leader 24-character leader string Must preserve all 24 positions
Record Type From leader position 06 Must be queryable
Control Fields Tags 001-009 No subfields, single data value
Variable Fields Tags 010-999 Tag + indicators + subfields
Indicators Two indicator positions Must preserve blank vs missing
Subfields Code + value pairs Variable count, ordered
Text encoding UTF-8 strings Must preserve all Unicode content from mrrc's normalized MarcRecord

1.2 Edge Cases Checklist

Data Structure & Ordering (CRITICAL): - [ ] Field ordering — 001, 245, 650, 001 sequence preserved exactly (not alphabetized or renumbered) - [ ] Subfield code ordering — $d$c$a preserved as-is (not reordered to $a$c$d) - [ ] Repeating fields (multiple 650s, etc.) — preserves in order - [ ] Repeating subfields within a field (e.g., $a $a $a) — preserves in order - [ ] Empty subfield values (distinguish $a "" from missing $a)

Text Content: - [ ] UTF-8 multilingual content (CJK, Arabic, Hebrew, Cyrillic) - [ ] Combining diacritics and special characters (NOT precomposed) - [ ] Whitespace preservation (leading/trailing spaces in subfield values) - [ ] Control characters in data (0x00-0x1F) — handled or rejected?

Field Structure: - [ ] Control fields (001-009) vs variable fields (010+) distinction preserved - [ ] Maximum field lengths (9999 bytes data, not including tag/indicators) - [ ] Blank vs missing indicators (space U+0020 vs null distinction) - [ ] Records with hundreds of fields

MARC-Specific & Validation: - [ ] Repeating control fields (001 should not repeat; test preservation/error handling) - [ ] Invalid tag values (e.g., "999", "0AB", empty) — rejected or preserved? - [ ] Invalid subfield codes (e.g., "0", space, non-ASCII) — rejected or preserved?

1.3 Schema Notation

Provide schema in: 1. Native format (proto file, Avro schema, etc.) 2. ASCII diagram showing structure 3. Example record serialized with annotations


2. Test Data Sets

2.1 Fidelity Test Set (100 records)

Location: tests/data/fixtures/fidelity_test_100.mrc

Status: Currently under development. See FIDELITY_TEST_SET.md for detailed composition requirements and creation checklist.

Composition: - 50 bibliographic records (varied formats: books, serials, scores, maps) - 25 authority records (personal, corporate, subject headings) - 15 holdings records - 10 edge case records (encoding, maximum sizes, special characters)

Selection criteria: - Must include all common record types (BKS, SER, MUS, MAP, etc.) - Must include multilingual records (CJK, Arabic, Cyrillic) - Must include records with 100+ fields - Must include records with repeating fields/subfields - Must include synthetic worst-case records (maximum field sizes, control character boundaries)

Validation requirement: Before using in evaluations, must pass validation script verifying composition, encoding coverage, and edge case presence.

Note: The test set includes MARC-8 encoded source records to exercise mrrc's normalization pipeline; binary formats only see the resulting UTF-8 MarcRecord objects.

2.2 Performance Test Set (10,000 records)

Location: tests/data/fixtures/10k_records.mrc

Already available; used for standardized benchmark comparisons.

2.3 Stress Test Set (100,000 records)

Location: tests/data/fixtures/100k_records.mrc

For extended performance evaluation and memory profiling.


3. Round-Trip Testing Protocol

3.1 Test Procedure

The round-trip test validates that candidate formats can preserve MarcRecord data with perfect fidelity:

Step 1: Load ISO 2709 → MarcRecord (mrrc's import layer)
Step 2: MarcRecord → Candidate Format (serialize)
Step 3: Candidate Format → MarcRecord (deserialize)
Step 4: Compare Step 1 MarcRecord vs Step 3 MarcRecord (field-by-field)
Result: PASS if identical, FAIL if any mismatch

Important: Comparison is performed on normalized MarcRecord structures (leader, fields, indicators, subfields, UTF-8 strings). We do NOT compare original ISO 2709 bytes or MARC-8 escape sequences — those are handled by mrrc's import layer, not the candidate format.

3.2 Validation Criteria

Criterion Weight Pass Condition
Leader preservation Critical Positions 0-3, 12-15 may recalculate; all others match exactly
Field ordering Critical Fields in exact input sequence (not reordered or sorted)
Tag values Critical All tags present, matching baseline exactly (e.g., "001", "245")
Indicator values Critical Exact match including blank (space char) vs missing distinction
Subfield code ordering Critical All codes present and in exact input order (not reordered)
Subfield values Critical Exact UTF-8 string match, preserving empty strings vs null distinction

3.2.1 Correctness Specification

MUST Match Exactly: - Leader (24 chars): Positions 0-3 (logical record length) and 12-15 (base address) MAY be recalculated, but all OTHER positions (5-11, 17-23) MUST match exactly - Tag values: String representation exactly (e.g., "001", "245") with numeric value preserved - Field ordering: Fields in exact input sequence (001, 245, 650, 001 → NOT reordered alphabetically or numerically) - Indicator values: Character values, where space (U+0020) is distinct from null/missing - Subfield codes: Character values in correct order (e.g., $d$c$a → NOT reordered to $a$c$d) - Subfield values: Exact UTF-8 byte-for-byte match, preserving empty strings ("") as distinct from missing values

MUST NOT Collapse or Transform: - Whitespace within strings must be preserved exactly (no trim, no collapse, leading/trailing spaces matter) - Empty subfield values ($a "") are distinct from absent subfields (no $a) - Combining diacritical marks must preserve their UTF-8 encoding (do NOT normalize to precomposed forms) - Field and subfield sequence must be preserved exactly (no reordering, no sorting, no deduplication)

MUST NOT Compare: - Original ISO 2709 byte encoding (MARC-8 vs UTF-8) — that's mrrc's responsibility - Internal serialization order (e.g., protobuf field order vs JSON property order) - Memory representation or object IDs

Failure Threshold: - Any mismatch in the "MUST Match" criteria → FAIL for that record - Any crash/panic on invalid input → FAIL entire evaluation - 100% perfect round-trip rate is MANDATORY for recommendation

3.3 Fidelity Score

Calculate: (records_perfect / total_records) × 100

  • 100% = Required for recommendation
  • <100% = Document all failures with root cause

3.4 Failure Analysis Template

For any failed round-trip, provide:

Record ID Field/Tag Criterion Expected Actual Root Cause
n/a n/a n/a n/a n/a n/a

Failure Investigation Checklist: - [ ] Field ordering changed (001, 245, 650 reordered alphabetically or numerically)? - [ ] Subfield code order changed ($d$c$a reordered to $a$c$d)? - [ ] Encoding issue (UTF-8 normalization, combining diacritics)? - [ ] Indicator handling (space vs null)? - [ ] Subfield presence missing (wrong count, missing codes)? - [ ] Empty string handling (empty $a "" vs missing $a)? - [ ] Whitespace trimmed (leading/trailing spaces lost)? - [ ] Leader position recalculation (only 0-3, 12-15 expected to vary)? - [ ] Type conversion error (string, integer, character)? - [ ] Data truncation (field size limit)?

3.5 Reporting Format

## Round-Trip Fidelity Results

**Test Set:** fidelity_test_100.mrc
**Records Tested:** 100
**Perfect Round-Trips:** XX/100 (XX%)
**Test Date:** YYYY-MM-DD

### Failures (if any)

[Use failure analysis template above]

### Notes

[Any format-specific observations about data preservation]

4. Failure Modes Testing

Before running performance benchmarks, each format must be tested for robustness against invalid or edge-case input:

4.1 Error Handling Protocol

Scenario Test Input Expected Behavior
Truncated record Incomplete serialized record Graceful error with clear message, no panic
Invalid tag Tag value "99A" or empty string Rejected with validation error
Oversized field Field >9999 bytes Either preserved exactly OR rejected with clear error
Invalid indicator Non-ASCII character as indicator Rejected with validation error
Null subfield value Subfield with null pointer Handled consistently (error or serialize as empty)
Malformed UTF-8 Invalid UTF-8 byte sequence in text Rejected with clear error message
Missing leader Record without 24-char leader Rejected with validation error

4.2 Reporting Format

Document in evaluation report:

## 3. Failure Modes

| Scenario | Result | Notes |
|----------|--------|-------|
| Truncated record | ✓ Graceful error | "Unexpected end of data" |
| Invalid tag | ✓ Validation error | Rejected at deserialization |
| Oversized field | ✓ Error | Rejected (limit enforced) |
| ... | ... | ... |

**Overall Assessment:** [Format handles errors gracefully / Format panics on invalid input / etc.]

5. Performance Benchmark Protocol

5.1 Test Environment

Document: - CPU model and cores - RAM - Storage type (SSD/HDD) - OS version - Rust version (primary) and rustc optimization level (-O for release) - Format library version (Rust crate) - Python version and libraries (if evaluating multi-language support, secondary)

5.2 Metrics to Measure

Metric Unit Description
Read Throughput records/sec Deserialize from format to MarcRecord objects
Write Throughput records/sec Serialize from MarcRecord objects to format
File Size (raw) bytes Output file size without compression
File Size (gzip) bytes Compressed with gzip -9
Compression Ratio % 1 - (format_size / iso2709_size)
Peak Memory MB Maximum RSS during operation

Benchmarks should run single-threaded to ensure comparable results across formats.

5.3 Baseline Measurement (ISO 2709)

Before evaluating other formats, establish the ISO 2709 baseline:

Test Date: YYYY-MM-DD
Environment: [CPU, RAM, OS, Rust version]

Metric | Value
-------|-------
Read (rec/sec) | X
Write (rec/sec) | Y
File Size (10k records) | Z bytes
Gzip Compressed | W bytes
Peak Memory | V MB

Store this baseline result permanently in BASELINE_ISO2709.md for all subsequent format comparisons.

5.4 Benchmark Procedure (Rust Primary)

Rust benchmarks using Criterion.rs:

# Build release binary with optimizations
cargo build --release

# Run Criterion benchmarks
cargo bench --bench format_benchmarks -- [format_name] --baseline iso2709

# For manual timing:
# Warm-up: 3 iterations (discarded)
# Measured: 10 iterations (averaged)
hyperfine --warmup 3 --runs 10 'target/release/mrrc_bench_read --format {fmt}'

Python secondary (if applicable): - Only after Rust primary evaluation is complete - Document with lower priority than Rust metrics

5.5 Baseline Comparison

All metrics compared against ISO 2709 baseline:

Operation ISO 2709 Baseline Format Result Delta
Read X rec/sec Y rec/sec +/-Z%
Write X rec/sec Y rec/sec +/-Z%
Size X bytes Y bytes +/-Z%

5.6 Reporting Table

## Performance Results

**Test Set:** 10k_records.mrc (10,000 records)
**Test Date:** YYYY-MM-DD
**Environment:** [specs]

| Metric | ISO 2709 | [Format] | Delta |
|--------|----------|----------|-------|
| Read (rec/sec) | 150,000 | TBD | TBD |
| Write (rec/sec) | 120,000 | TBD | TBD |
| File Size | 12.5 MB | TBD | TBD |
| Compressed Size | 4.2 MB | TBD | TBD |
| Peak Memory | 45 MB | TBD | TBD |

6. Integration Assessment Criteria

6.1 Dependency Analysis (Rust Focus)

Evaluate the cost of integrating the format library into mrrc's Rust implementation:

Factor Guidance
Rust external deps Count Cargo crate dependencies (direct + transitive). Fewer is better.
Dep health Rate each Rust dependency: actively maintained? Security advisories? Recent commits?
Build complexity Simple cargo add (score 1) vs complex build scripts/native compilation (score 5)
License compatibility Must be compatible with Apache 2.0 (mrrc's license)
Compile time impact Measure incremental build time; Rust compile time matters for CI/iteration speed

6.2 Language Support Matrix

Priority order for mrrc:

Language Library Maturity Priority Notes
Rust crate_name ⭐⭐⭐⭐⭐ PRIMARY Must have mature Rust implementation
Python package_name ⭐⭐⭐⭐ Secondary For PyO3 bindings (if format recommended)
Java package_name ⭐⭐⭐ Tertiary Ecosystem context only
Go package_name ⭐⭐ Tertiary Ecosystem context only
C++ library_name ⭐⭐⭐ Tertiary Ecosystem context only

6.3 Schema Evolution Score

Capability Score
No schema evolution 1
Append-only evolution 2
Backward compatible 3
Forward compatible 4
Full bi-directional 5

6.4 Ecosystem Maturity

  • [ ] Production use cases documented
  • [ ] Active maintenance (commits in last 6 months)
  • [ ] Security advisories process
  • [ ] Stable API (1.0+ release)
  • [ ] Good documentation
  • [ ] Community size / adoption

7. Use Case Fit Scoring

Rate each format 1-5 for each use case:

Use Case Score Notes
Simple data exchange API integration, file transfer
High-performance batch Large-scale processing
Analytics/big data Spark, Hadoop, Parquet ecosystem
API integration REST/gRPC services
Long-term archival 10+ year preservation

8. Evaluation Report Template

Each format evaluation produces a report following this structure:

# [Format Name] Evaluation for MARC Data

**Issue:** mrrc-fks.N
**Date:** YYYY-MM-DD
**Author:** [name]
**Status:** Draft | Complete

## Executive Summary

[2-3 sentences: Is this format viable? Key findings?]

## 1. Schema Design

### 1.1 Schema Definition
[Native schema file]

### 1.2 Structure Diagram
[ASCII diagram]

### 1.3 Example Record
[Annotated example]

### 1.4 Edge Case Coverage
[Checklist from section 1.2]

## 2. Round-Trip Fidelity

[Results per section 3]

## 3. Performance Benchmarks

[Results per section 4]

## 4. Integration Assessment

[Analysis per section 5]

## 5. Use Case Fit

[Scoring per section 6]

## 6. Implementation Complexity

- Lines of code estimate
- Development time estimate
- Maintenance burden assessment

## 7. Strengths & Weaknesses

### Strengths
- Bullet points

### Weaknesses
- Bullet points

## 8. Recommendation

**Verdict:** Recommended | Conditional | Not Recommended

[Rationale paragraph]

## Appendix

### A. Test Commands
### B. Sample Code
### C. References

9. Comparison Matrix Template

After all evaluations complete, aggregate into:

Format Fidelity Read (rec/s) Write (rec/s) Size Deps Evolution Overall
ISO 2709 100% 150k 120k 1.0x 0 1 Baseline
Protobuf TBD TBD TBD TBD TBD TBD TBD
FlatBuffers TBD TBD TBD TBD TBD TBD TBD
... ... ... ... ... ... ... ...

Document History

Date Version Changes
2026-01-12 1.0 Initial framework definition
2026-01-12 1.1 Clarify encoding assumptions: mrrc normalizes to UTF-8, formats don't handle MARC-8; remove startup cost metric; simplify use cases
2026-01-13 1.2 Correctness improvements: Add explicit equality semantics; add failure modes testing; establish ISO 2709 baseline requirement; add synthetic worst-case records to test set; clarify leader position handling; add failure investigation checklist
2026-01-13 1.5 Ordering emphasis: Elevated field & subfield ordering to CRITICAL in validation table; reorganized edge cases checklist to highlight ordering; enhanced failure checklist with field/subfield reordering detection; clarified correctness spec that field/subfield sequence must be preserved exactly
2026-01-13 2.0 Rust-first focus: Reframed evaluations to prioritize Rust implementation for mrrc core; updated benchmark procedure to use Rust (Criterion.rs); added Rust dependency analysis emphasis; clarified language support matrix with priority tiers (Rust primary, Python secondary, others tertiary); updated environment documentation to emphasize Rust version and optimization level