Skip to content

Migration Guide: pymarc to mrrc

This guide helps existing pymarc users migrate to mrrc (MARC Rust Crate), a high-performance Rust-based MARC library with Python bindings.

Overview

mrrc is a Rust-based MARC library with Python bindings, providing:

  • High performance Rust implementation with Python convenience
  • Full pymarc API compatibility - Drop-in replacement for existing pymarc code
  • Type-safe design with comprehensive error handling
  • Native Python integration through PyO3 bindings with familiar data structures
  • All standard MARC operations including reading, writing, and format conversions

Installation

pip install mrrc

Quick Start

Before (pymarc)

import pymarc

# Reading records
with open('records.mrc', 'rb') as f:
    reader = pymarc.MARCReader(f)
    for record in reader:
        print(record['245']['a'])

# Writing records
writer = pymarc.MARCWriter(open('output.mrc', 'wb'))
field = pymarc.Field('245', ['1', '0'], [('a', 'Title')])
record = pymarc.Record(to_utf8=True)
record.append(field)
writer.write(record)
writer.close()

After (mrrc) - pymarc-Compatible

mrrc supports nearly identical pymarc syntax:

import mrrc

# Reading records — pass a file path for best performance.
# Path input uses Rust-native file I/O, which releases the GIL
# and enables true multi-thread parallelism.
reader = mrrc.MARCReader('records.mrc')
for record in reader:
    print(record['245']['a'])  # pymarc dictionary syntax works!
    print(record.title())      # Also available as convenience method

# Writing records - inline construction (similar to pymarc)
with open('output.mrc', 'wb') as f:
     with mrrc.MARCWriter(f) as writer:
         record = mrrc.Record(fields=[
             mrrc.Field('245', indicators=['1', '0'], subfields=[
                 mrrc.Subfield('a', 'Title'),
             ]),
         ])
         writer.write(record)

API Comparison

Record Creation

Operation pymarc mrrc Same?
Create empty record pymarc.Record() mrrc.Record() Same
Create with leader pymarc.Record(leader) mrrc.Record(leader) Same
Add control field record.add_field(Field('001', data='value')) record.add_control_field('001', 'value') Different
Add data field record.append(field) record.add_field(field) Different

Field Creation

Operation pymarc mrrc Same?
Create field Field('245', ['1','0'], [('a', 'Title')]) Field('245', indicators=['1','0'], subfields=[Subfield('a', 'Title')]) Similar
Add subfield field.add_subfield('a', 'value') field.add_subfield('a', 'value') Same
Get subfields field.get_subfields('a') field.get_subfields('a') Same
Access subfield field['a'] field['a'] Same

Reading/Writing

Operation pymarc mrrc Same?
Create reader MARCReader(file_obj) MARCReader('path.mrc') (recommended) or MARCReader(file_obj) Enhanced
Read record reader.next() or next(reader) next(reader) Same
Write record writer.write(record) writer.write(record) Same
Iterate for record in reader: for record in reader: Same
Context manager Manual close required with MARCWriter(f) as w: Enhanced

Accessing Data

Operation pymarc mrrc Same?
Get title record['245']['a'] record['245']['a'] or record.title() Same
Get field record['650'] record['650'] or record.fields_by_tag('650') Same
Check if field exists '245' in record '245' in record Same
Get all fields for field in record: for field in record: Same
Control field record['001'].value record['001'].value or record.control_field('001') Same

API Compatibility

mrrc provides excellent pymarc API compatibility with support for all major operations:

Record Field Access - Dictionary-Style (Identical to pymarc)

# Dictionary-style access works exactly like pymarc
field = record['245']                      # Get first 245 field (or None if missing)
all_fields = record.fields_by_tag('245')   # Get all 245 fields

# Missing fields return None (matching pymarc behavior)
field = record['999']                      # Returns None if field doesn't exist
                                           # (does NOT raise KeyError)

# Check if field exists (identical to pymarc)
if '245' in record:
    title_field = record['245']

# Dict-like .get() method (identical to pymarc)
field = record.get('245')                  # Get first field, None if missing
field = record.get('999', default_field)   # With default value

# Alternative method-based access also available
field = record.get_field('245')            # Get first field

Field Subfield Access - Dictionary-Style (Identical to pymarc)

# Dictionary-style access works exactly like pymarc
title = field['a']                          # Get first 'a' subfield
if 'a' in field:
    value = field['a']

# Missing subfields return None (matching pymarc behavior)
value = field['z']                          # Returns None if subfield doesn't exist
                                            # (does NOT raise KeyError)

# Get all values for a code
all_subfields = field.get_subfields('a')   # Get list of 'a' subfield values

# Iterate over all subfields
for subfield in field.subfields():
    print(f"{subfield.code}: {subfield.value}")

# Get subfields as dictionary
subfield_dict = field.subfields_as_dict()

Field Operations (Identical to pymarc)

field.add_subfield('a', 'value')   # Identical to pymarc
field.get_subfields('a')           # Get list of values - identical to pymarc
field.delete_subfield('a')         # Delete subfield by code
field.subfields_as_dict()          # Get all subfields as dict
field.subfields()                  # Get all Subfield objects
field.is_control_field()           # False for data fields (identical to pymarc)

Record Operations (Identical to pymarc + Extensions)

# Standard pymarc operations
record.remove_field('245')         # Remove field(s) by tag
record.append(field)               # Add field (same as add_field for compatibility)
record.get_fields('650', '651')    # Get fields for multiple tags

# Convenience methods (identical to pymarc)
record.title()                     # Get title (245 $a)
record.author()                    # Get author (100/110/111 $a)
record.isbn()                      # Get ISBN (020 $a)
record.issn()                      # Get ISSN (022 $a)
record.subjects()                  # Get all subjects (6XX $a)
record.publisher()                 # Get publisher (260 $b)
record.physical_description()      # Get extent (300 $a)
record.series()                    # Get series (490 $a)

Leader Access - Property-Based and Position-Based

# Property-based access (recommended for clarity)
leader = record.leader()
leader.record_status = 'c'          # Set record status
leader.record_type = 'a'            # Set record type
leader.bibliographic_level = 'd'    # Set bibliographic level

# Position-based access (also available for pymarc compatibility)
leader[5] = 'c'                     # Set record status at position 5
leader[6] = 'a'                     # Set record type at position 6

# Slice access to get multiple positions
record_length = int(leader[0:5])    # Get first 5 chars (record length)
cataloging_form = leader[18]        # Get cataloging form char at position 18

# Position and property access are automatically synchronized
leader.record_status = 'd'
assert leader[5] == 'd'             # Position-based access reflects property change

Reader/Writer Interface

# Reading — pass a path string or pathlib.Path for best performance.
# This uses Rust-native file I/O, which releases the Python GIL during
# parsing and enables true multi-thread parallelism.
reader = mrrc.MARCReader('records.mrc')
for record in reader:              # Standard iteration
    print(record.title())

# Python file objects and in-memory bytes also work, but hold the GIL
# during reads, so they won't benefit from multi-threading.
with open('records.mrc', 'rb') as f:
    reader = mrrc.MARCReader(f)         # Works, but slower under threading
reader = mrrc.MARCReader(marc_bytes)    # Also works for in-memory data

# Writing (identical to pymarc, with context manager support)
with mrrc.MARCWriter(f) as writer:
    writer.write(record)           # Same method name as pymarc

Minimal API Differences

mrrc is nearly 100% compatible with pymarc. Here are the only two required changes:

1. Record Constructor

Record() now works with no arguments (leader defaults to Leader()):

# pymarc
record = pymarc.Record()

# mrrc - both work
record = mrrc.Record()                  # Default leader
record = mrrc.Record(mrrc.Leader())     # Explicit leader

# Note: Once created, all field access works identically
print(record['245']['a'])  # Works exactly like pymarc

2. Optional: Extended Convenience Methods

mrrc extends pymarc with additional convenience methods:

# All pymarc methods work:
record.title()             # Get title
record.author()            # Get author
record.isbn()              # Get ISBN

# Plus many additional methods:
record.issn()              # Get ISSN
record.issn_title()        # Get ISSN title
record.sudoc()             # Get SuDoc classification
record.issnl()             # Get ISSN-L
record.pubyear()           # Get publication year
record.physical_description()  # Get extent/pages
record.is_book()           # Check if book
record.is_serial()         # Check if serial
record.is_music()          # Check if music

Migration Checklist

Minimal changes needed:

  • [ ] Replace import pymarc with import mrrc
  • [ ] Update record creation: pymarc.Record()mrrc.Record() (or mrrc.Record(mrrc.Leader()))
  • [ ] Update field creation to use indicators= and subfields= kwargs if desired
  • [ ] Everything else works the same - dictionary access, method names, iteration all identical

Optional enhancements:

  • [ ] Pass file paths to MARCReader('file.mrc') instead of file objects (releases the GIL, enables multi-thread parallelism)
  • [ ] Use additional convenience methods like record.issn(), record.sudoc(), etc. for specialized use cases
  • [ ] Update writers to use context managers: with mrrc.MARCWriter(f) as w: (better resource management)

Known Differences from pymarc

  1. Record constructor: mrrc.Record() works (defaults to Leader()), or pass explicit mrrc.Record(mrrc.Leader())
  2. UTF-8 encoding: Set leader.character_coding = 'a' for UTF-8 (mrrc uses UTF-8 by default internally)
  3. No field removal during iteration: Use list comprehension or separate pass if modifying records during iteration
  4. Type safety: All data is validated at Rust layer (this is a feature, prevents data corruption)

Getting Help

  • Documentation: See class docstrings in Python (IDE autocomplete available)
  • Type hints: Full .pyi stub file provides IDE support
  • Examples: See test files for comprehensive examples
  • Issues: Report bugs at https://github.com/dchud/mrrc/issues

Contributing

We welcome contributions! The project is structured as: - src/: Core Rust MARC library - src-python/: Python wrapper with PyO3 - tests/: Integration tests

To build locally:

uv sync
uv run maturin develop