Skip to content

Format Conversion Modules Consistency Audit - mrrc-aw5.3

Date: 2025-12-28
Files Reviewed: json.rs, xml.rs, marcjson.rs, csv.rs, dublin_core.rs, mods.rs
Total Lines: 2266 lines across 6 modules

Overview

Format conversion modules provide serialization/deserialization to/from various metadata formats: - JSON: Generic JSON representation - XML/MARCXML: Standard MARC XML format - MARCJSON: JSON-LD standard format - CSV: Tabular export for spreadsheets - Dublin Core: Simplified 15-element schema - MODS: Rich metadata schema from Library of Congress


1. API Naming Consistency

Current API Surface

Module Export Functions Pattern Accepts
json record_to_json() record_to_FORMAT Single Record
json_to_record() FORMAT_to_record -
xml record_to_xml() record_to_FORMAT Single Record
xml_to_record() FORMAT_to_record -
marcjson record_to_marcjson() record_to_FORMAT Single Record
marcjson_to_record() FORMAT_to_record -
csv records_to_csv() record(s)_to_FORMAT PLURAL Slice
dublin_core record_to_dublin_core() record_to_FORMAT Single Record
dublin_core_to_xml() FORMAT_to_FORMAT -
mods record_to_mods_xml() record_to_FORMAT_xml Single Record

Issues Identified

Issue #1: CSV Uses Plural Pattern

csv::records_to_csv(&[Record])              // Plural (batch)
json::record_to_json(&Record)               // Singular
xml::record_to_xml(&Record)                 // Singular
marcjson::record_to_marcjson(&Record)       // Singular
dublin_core::record_to_dublin_core(&Record) // Singular
mods::record_to_mods_xml(&Record)           // Singular

Status: ⚠️ INCONSISTENT
Rationale: CSV is semantically batch-oriented (tabular), but breaks naming symmetry
Recommendation: Keep as-is (semantic correctness) but document the exception, OR add record_to_csv() single-record variant with delegation

Issue #2: MODS Uses _xml Suffix

mods::record_to_mods_xml(&Record)     // Includes _xml suffix
dublin_core::record_to_dublin_core()  // No suffix (but also outputs XML)

Status: ⚠️ INCONSISTENT
Rationale: Dublin Core also outputs XML but doesn't include suffix in function name
Recommendation: Standardize - either: - Option A: Rename record_to_dublin_core()record_to_dublin_core_xml() for symmetry - Option B: Rename record_to_mods_xml()record_to_mods() to match others - Current: Acceptable as-is (MODS has _xml suffix to prevent confusion, as it's more technical)

Issue #3: Dublin Core Two-Step Pattern

// Dublin Core returns intermediate type, requires second call
let dc: DublinCoreRecord = dublin_core::record_to_dublin_core(&record)?;
let xml: String = dublin_core::dublin_core_to_xml(&dc)?;

// Others return final format directly
let json: Value = json::record_to_json(&record)?;
let xml: String = xml::record_to_xml(&record)?;
let csv: String = csv::records_to_csv(&[record])?;

Status: ⚠️ INCONSISTENT PATTERN
Rationale: DC returns intermediate struct, allowing partial processing
Recommendation: Add convenience function record_to_dublin_core_xml() for consistency, keep existing for flexibility


2. Error Handling Consistency

Error Handling Patterns

All modules follow consistent pattern:

pub fn record_to_FORMAT(record: &Record) -> Result<T> { ... }

Type Analysis: - json.rs: Returns Result<Value> (serde_json::Value) - xml.rs: Returns Result<String> (raw XML) - marcjson.rs: Returns Result<Value> (serde_json::Value) - csv.rs: Returns Result<String> (raw CSV) - dublin_core.rs: Returns Result<DublinCoreRecord> (intermediate struct) - mods.rs: Returns Result<String> (raw XML)

Return Type Consistency:

Format Direct XML Uses Serde Intermediate Type Status
JSON - ✓ serde_json::Value - ✓ Consistent
XML ✓ String - - ✓ Consistent
MARCJSON - ✓ serde_json::Value - ✓ Consistent
CSV ✓ String - - ✓ Consistent
Dublin Core ✓ String (second fn) - ✓ DublinCoreRecord ⚠️ Two-step
MODS ✓ String - - ✓ Consistent

Status: ✓ GOOD - Mostly consistent, Dublin Core is intentional design
Recommendation: Document the Dublin Core pattern as intentional (allows filtering/modification before serialization)


3. Code Organization Patterns

String Building Approaches

XML Modules (xml.rs, mods.rs, dublin_core.rs output): - Use String with write!() macro - Manual tag construction - Proper XML escaping

Example (mods.rs):

let mut xml = String::from("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n");
write!(xml, "<mods>...</mods>")?;

JSON Modules (json.rs, marcjson.rs): - Use serde_json json!() macro - Type-safe via serde - Automatic escaping

Example (json.rs):

fields.push(json!({
    "leader": String::from_utf8_lossy(&record.leader.as_bytes()?).to_string()
}));

CSV Module (csv.rs): - Use String with writeln!() macro - Custom escape function for CSV values

Status: ✓ GOOD - Each uses appropriate tool for format
Recommendation: KEEP - serde for structured formats, String for streaming


4. Field Mapping Consistency

MARC Field Crosswalks

All modules document their MARC field mappings in module-level doc comments:

Module Mappings Documented Approach Quality
JSON ✓ Yes Generic structure Good
XML ✓ Yes MARCXML standard Good
MARCJSON ✓ Yes Standard format Good
CSV ✓ Yes Tabular columns Good
Dublin Core ✓ Yes 15-element mapping Excellent
MODS ✓ Yes Library of Congress Excellent

Status: ✓ EXCELLENT - All document their mappings clearly
Recommendation: NONE


5. Helper Function Organization

Code Organization by Module

json.rs (288 lines): - 2 main functions (to/from) - No helper functions - Straightforward implementation

xml.rs (306 lines): - 2 main functions (to/from) - 3 helper types (MarcXmlRecord, MarcXmlControlField, MarcXmlDataField) - Uses serde for parsing

marcjson.rs (295 lines): - 2 main functions (to/from) - 2 helper functions (leader parsing) - Clean, focused

csv.rs (287 lines): - 2 main functions (batch to, filtered to) - 2 helper functions (escape_csv_value, etc.) - Good separation of concerns

dublin_core.rs (494 lines): - 1 struct (DublinCoreRecord - 15 fields) - 2 main functions (to DC, DC to XML) - 15+ internal helper functions for MARC→DC mapping - Most complex due to semantic mapping

mods.rs (596 lines): - 1 main function (record_to_mods_xml) - 15+ internal helper functions (write_titles, write_names, etc.) - Most complex due to rich metadata schema

Status: ✓ GOOD - Helper functions properly hidden, main functions clean
Recommendation: NONE


6. Round-Trip Conversion Support

Format to_FORMAT FORMAT_to_record Round-trip Status
JSON ✓ Yes Complete
XML ✓ Yes Complete
MARCJSON ✓ Yes Complete
CSV ✓ (→) ✗ No Export-only
Dublin Core ✗ No Export-only
MODS ✗ No Export-only

Status: ✓ GOOD - CSV, Dublin Core, MODS are intentionally export-only (lossy formats)
Recommendation: Document in module rustdoc that these are one-way conversions


7. Testing Coverage

Test Organization

  • json.rs: 6 tests (serialize, deserialize, roundtrip)
  • xml.rs: 8 tests (serialize, deserialize, roundtrip, element checking)
  • marcjson.rs: 6 tests (serialize, deserialize, roundtrip)
  • csv.rs: 5 tests (single field, multiple records, escaping)
  • dublin_core.rs: 10 tests (field mapping verification)
  • mods.rs: 8 tests (field mapping verification)

Status: ✓ GOOD - Reasonable coverage for each format
Recommendation: Consider adding integration tests for complete workflows (roundtrip with real data)


Summary of Inconsistencies

Critical (Breaking Changes):

None identified.

Medium Priority (Recommend standardization):

Issue Location Impact Recommendation
CSV plural naming csv.rs API symmetry Acceptable - semantic correctness (batch). Document exception.
MODS _xml suffix mods.rs Naming symmetry Acceptable - prevents confusion. Consider for Dublin Core too.
Dublin Core two-step dublin_core.rs API simplicity Add record_to_dublin_core_xml() convenience function.

Low Priority (Documentation):

  • Document that CSV, Dublin Core, MODS are one-way (export-only)
  • Clarify batch vs single-record semantics for CSV

Recommendations

Immediate (Next Review Cycle)

  1. Add Convenience Function: dublin_core::record_to_dublin_core_xml()
  2. Combines record_to_dublin_core() and dublin_core_to_xml()
  3. Improves API consistency with other formats

  4. Document CSV Batch Pattern: Add note in csv.rs module doc explaining plural naming is intentional

  5. CSV Single-Record Variant (Optional): record_to_csv(&Record) -> Result<String>

  6. Would improve consistency with other modules
  7. Could delegate to records_to_csv(&[record])

Long-term (Future Enhancement)

  • Consider trait-based design for format converters (e.g., Serializable trait)
  • Add reverse conversions for CSV (lossy but useful)
  • Consider generic serialization builder

Conclusion

Overall Assessment: Format conversion modules are WELL-DESIGNED and CONSISTENT

  • ✓ Error handling is uniform
  • ✓ Documentation is excellent
  • ✓ Field mappings are clear
  • ✓ Helper functions properly organized
  • ⚠️ Minor naming inconsistencies (acceptable with documentation)

Action Items: 1. Add record_to_dublin_core_xml() convenience function (low effort, improves consistency) 2. Document CSV batch pattern in module rustdoc (documentation only) 3. Consider adding CSV single-record variant (optional, low priority)

Audit Status: Ready for closure