Migration Guide: pymarc to mrrc¶

This guide helps existing pymarc users migrate to mrrc, a high-performance Rust-based MARC library with Python bindings.

Overview¶

mrrc is a Rust-based MARC library with Python bindings, providing:

High performance Rust implementation with Python convenience
Full pymarc API compatibility - Drop-in replacement for existing pymarc code
Type-safe design with comprehensive error handling
Native Python integration through PyO3 bindings with familiar data structures
All standard MARC operations including reading, writing, and format conversions

Installation¶

pip install mrrc

Quick Start¶

Before (pymarc)¶

import pymarc

# Reading records
with open('records.mrc', 'rb') as f:
    reader = pymarc.MARCReader(f)
    for record in reader:
        print(record['245']['a'])

# Writing records
writer = pymarc.MARCWriter(open('output.mrc', 'wb'))
field = pymarc.Field('245', ['1', '0'], [('a', 'Title')])
record = pymarc.Record(to_utf8=True)
record.add_field(field)
writer.write(record)
writer.close()

After (mrrc) - pymarc-Compatible¶

mrrc supports nearly identical pymarc syntax:

import mrrc

# Reading records — pass a file path for best performance.
# Path input uses Rust-native file I/O, which releases the GIL
# and enables true multi-thread parallelism.
reader = mrrc.MARCReader('records.mrc')
for record in reader:
    print(record['245']['a'])  # pymarc dictionary syntax works!
    print(record.title)        # Property access (same as pymarc)

# Writing records - inline construction (similar to pymarc)
with open('output.mrc', 'wb') as f:
     with mrrc.MARCWriter(f) as writer:
         record = mrrc.Record(fields=[
             mrrc.Field('245', indicators=['1', '0'], subfields=[
                 mrrc.Subfield('a', 'Title'),
             ]),
         ])
         writer.write(record)

API Comparison¶

Record Creation¶

Operation	pymarc	mrrc	Same?
Create empty record	`pymarc.Record()`	`mrrc.Record()`	Same
Create with leader	`pymarc.Record(leader)`	`mrrc.Record(leader)`	Same
Add control field	`record.add_field(Field('001', data='value'))`	`record.add_control_field('001', 'value')` or `record.add_field(Field('001', data='value'))`	Similar
Add data field	`record.add_field(field)`	`record.add_field(field)`	Same

Field Creation¶

Operation	pymarc	mrrc	Same?
Create field	`Field('245', ['1','0'], [('a', 'Title')])`	`Field('245', indicators=['1','0'], subfields=[Subfield('a', 'Title')])`	Similar
Create control field	`Field('001', data='12345')`	`Field('001', data='12345')`	Same
Add subfield	`field.add_subfield('a', 'value')`	`field.add_subfield('a', 'value')`	Same
Add subfield at position	`field.add_subfield('a', 'value', pos=2)`	`field.add_subfield('a', 'value', pos=2)`	Same
Get subfields	`field.get_subfields('a')`	`field.get_subfields('a')`	Same
Access subfield	`field['a']`	`field['a']`	Same

Reading/Writing¶

Operation	pymarc	mrrc	Same?
Create reader	`MARCReader(file_obj)`	`MARCReader('path.mrc')` (recommended) or `MARCReader(file_obj)`	Enhanced
Permissive mode	`MARCReader(f, permissive=True)`	`MARCReader(f, permissive=True)`	Same
Unicode flag	`MARCReader(f, to_unicode=True)`	`MARCReader(f, to_unicode=True)`	Same
Read record	`reader.next()` or `next(reader)`	`next(reader)`	Same
Write record	`writer.write(record)`	`writer.write(record)`	Same
Iterate	`for record in reader:`	`for record in reader:`	Same
Context manager	Manual close required	`with MARCWriter(f) as w:`	Enhanced

Accessing Data¶

Operation	pymarc	mrrc	Same?
Get title	`record.title`	`record.title`	Same
Get field	`record['650']`	`record['650']` (first) or `record.get_fields('650')` (all)	Same
Check if field exists	`'245' in record`	`'245' in record`	Same
Get all fields	`for field in record:`	`for field in record:`	Same
Control field data	`record['001'].data`	`record['001'].data` or `record.control_field('001')`	Same
Missing field	`record['999']` raises KeyError	`record['999']` raises KeyError	Same
Safe field access	`record.get('999')` returns None	`record.get('999')` returns None	Same

API Compatibility¶

mrrc provides excellent pymarc API compatibility with support for all major operations:

Record Field Access - Dictionary-Style (Identical to pymarc)¶

# Dictionary-style access works exactly like pymarc
field = record['245']                      # Get first 245 field (raises KeyError if missing)
all_fields = record.get_fields('245')      # Get all 245 fields

# Safe access with .get() (returns None if missing)
field = record.get('245')                  # Get first field, None if missing
field = record.get('999', default_field)   # With default value

# Check if field exists (identical to pymarc)
if '245' in record:
    title_field = record['245']

Field Subfield Access - Dictionary-Style (Identical to pymarc)¶

# Dictionary-style access works exactly like pymarc
title = field['a']                          # Get first 'a' subfield
if 'a' in field:
    value = field['a']

# Missing subfields return None (matching pymarc behavior)
value = field['z']                          # Returns None if subfield doesn't exist
                                            # (does NOT raise KeyError)

# Get all values for a code
all_subfields = field.get_subfields('a')   # Get list of 'a' subfield values

# Iterate over all subfields
for subfield in field.subfields():
    print(f"{subfield.code}: {subfield.value}")

# Get subfields as dictionary
subfield_dict = field.subfields_as_dict()

Field Operations (Identical to pymarc)¶

field.add_subfield('a', 'value')       # Identical to pymarc
field.add_subfield('a', 'val', pos=2)  # Positional insert
field.get_subfields('a')               # Get list of values - identical to pymarc
field.delete_subfield('a')             # Delete subfield by code
field.subfields_as_dict()              # Get all subfields as dict
field.subfields()                      # Get all Subfield objects
field.is_control_field()               # False for data fields (identical to pymarc)
field.value()                          # Space-joined subfield values
field.format_field()                   # Human-readable field text

Record Operations (Identical to pymarc + Extensions)¶

# Standard pymarc operations
record.add_field(field1, field2)       # Add one or more fields
record.remove_field(field1, field2)    # Remove specific field objects
record.remove_fields('245', '650')    # Remove all fields with matching tags
record.add_ordered_field(field)        # Insert in tag-sorted position
record.add_grouped_field(field)        # Insert after same-tag group
record.add_field(field)                # Add field (accepts multiple: add_field(f1, f2, f3))
record.get_fields('650', '651')        # Get fields for multiple tags

# Record accessors (all are @property, matching pymarc)
record.title                           # Get title (245 $a)
record.author                          # Get author (100/110/111 $a)
record.isbn                            # Get ISBN (020 $a)
record.issn                            # Get ISSN (022 $a)
record.subjects                        # Get all subjects (6XX $a)
record.publisher                       # Get publisher (260 $b)
record.physical_description            # Get extent (300 $a)
record.series                          # Get series (490 $a)
record.pubyear                         # Get publication year (str, not int)
record.notes                           # Get all notes (5XX)
record.location                        # Get location (852 $a)
record.uniform_title                   # Get uniform title (130 $a)
record.sudoc                           # Get SuDoc classification (086 $a)
record.issn_title                      # Get ISSN title (222 $a)
record.issnl                           # Get ISSN-L (024 $a)
record.addedentries                    # Get added entries (7XX fields)

# Serialization (pymarc-compatible)
record.as_marc()                       # ISO 2709 bytes
record.as_json()                       # pymarc MARC-in-JSON string
record.as_dict()                       # pymarc-compatible dict

For field selection beyond get_fields(*tags) — matching on indicators, tag ranges, subfield presence, or a regex over subfield values — see the Query DSL guide. It's an mrrc extension with no pymarc equivalent.

Control Fields (Unified with Field)¶

# Control fields are now Field instances (matching pymarc)
cf = Field('001', data='12345')
print(cf.data)                  # '12345'
print(cf.is_control_field())    # True
print(isinstance(cf, Field))    # True

# ControlField still works as backward-compatible alias
from mrrc import ControlField
cf = ControlField('001', '12345')
print(cf.data)                  # '12345'

Leader Access - Attribute-Based and Position-Based¶

# Attribute-based access. Note: mrrc exposes the leader as a method call,
# record.leader(), where pymarc uses a record.leader attribute.
leader = record.leader()
leader.record_status = 'c'          # Set record status
leader.record_type = 'a'            # Set record type
leader.bibliographic_level = 'd'    # Set bibliographic level

# Position-based access (also available for pymarc compatibility)
leader[5] = 'c'                     # Set record status at position 5
leader[6] = 'a'                     # Set record type at position 6

# Slice access to get multiple positions
record_length = int(leader[0:5])    # Get first 5 chars (record length)
cataloging_form = leader[18]        # Get cataloging form char at position 18

# Position and property access are automatically synchronized
leader.record_status = 'd'
assert leader[5] == 'd'             # Position-based access reflects property change

Reader/Writer Interface¶

# Reading — pass a path string or pathlib.Path for best performance.
# This uses Rust-native file I/O, which releases the Python GIL during
# parsing and enables true multi-thread parallelism.
reader = mrrc.MARCReader('records.mrc')
for record in reader:              # Standard iteration
    print(record.title)

# Python file objects and in-memory bytes also work, but hold the GIL
# during reads, so they won't benefit from multi-threading.
with open('records.mrc', 'rb') as f:
    reader = mrrc.MARCReader(f)         # Works, but slower under threading
reader = mrrc.MARCReader(marc_bytes)    # Also works for in-memory data

# Writing (identical to pymarc, with context manager support)
with mrrc.MARCWriter(f) as writer:
    writer.write(record)           # Same method name as pymarc

Minimal API Differences¶

mrrc is nearly 100% compatible with pymarc. Here are the only two required changes:

1. Record Constructor¶

Record() now works with no arguments (leader defaults to Leader()):

# pymarc
record = pymarc.Record()

# mrrc - both work
record = mrrc.Record()                  # Default leader
record = mrrc.Record(mrrc.Leader())     # Explicit leader

# Note: Once created, all field access works identically
print(record['245']['a'])  # Works exactly like pymarc

2. Optional: Extended Convenience Properties¶

mrrc extends pymarc with additional convenience properties:

# All pymarc properties work:
record.title               # Get title
record.author              # Get author
record.isbn                # Get ISBN

# Plus many additional properties:
record.issn                # Get ISSN
record.issn_title          # Get ISSN title
record.sudoc               # Get SuDoc classification
record.issnl               # Get ISSN-L
record.pubyear             # Get publication year (str)
record.physical_description  # Get extent/pages
record.is_book()           # Check if book
record.is_serial()         # Check if serial
record.is_music()          # Check if music

New Features Beyond pymarc¶

Serialization Methods¶

record.as_marc()           # ISO 2709 bytes
record.as_json()           # pymarc-compatible MARC-in-JSON
record.as_dict()           # pymarc-compatible dict
field.as_marc()            # Field-level binary
field.value()              # Space-joined subfield values
field.format_field()       # Human-readable text

Module-Level Functions¶

import mrrc

records = mrrc.parse_xml_to_array(xml_str)
records = mrrc.parse_json_to_array(json_str)
mrrc.map_records(func, reader)

Constants¶

from mrrc import LEADER_LEN, END_OF_FIELD, END_OF_RECORD, SUBFIELD_INDICATOR

Exception Hierarchy¶

from mrrc import MrrcException, MarcError

Migration Checklist¶

Minimal changes needed:

[ ] Replace import pymarc with import mrrc
[ ] Update record creation: pymarc.Record() to mrrc.Record() (or mrrc.Record(mrrc.Leader()))
[ ] Update field creation to use indicators= and subfields= kwargs if desired
[ ] Everything else works the same - dictionary access, property names, iteration all identical

Optional enhancements:

[ ] Pass file paths to MARCReader('file.mrc') instead of file objects (releases the GIL, enables multi-thread parallelism)
[ ] Use additional convenience properties like record.issn, record.sudoc, etc. for specialized use cases
[ ] Update writers to use context managers: with mrrc.MARCWriter(f) as w: (better resource management)
[ ] Use record.as_marc(), record.as_json(), record.as_dict() for serialization

Error Handling¶

Permissive Mode (pymarc-compatible)¶

pymarc's permissive=True flag yields None for records that fail to parse, letting callers skip bad records and keep processing. mrrc supports the same flag with identical behavior:

# Works the same in both pymarc and mrrc
for record in mrrc.MARCReader('records.mrc', permissive=True):
    if record is None:
        continue  # skip malformed record
    print(record.title)

After each iteration step, two pymarc-compatible accessors carry diagnostic information about what was just read:

reader.current_chunk — the bytes of the record that was just read from the source. Set on every successful chunk read, whether the parse then succeeded or failed. The byte count matches the record's leader-declared length.
reader.current_exception — the typed MrrcException swallowed by the permissive read (None on a clean read).

reader = mrrc.MARCReader('records.mrc', permissive=True)
for record in reader:
    if record is None:
        log.warning(
            "skipped malformed record (%d bytes): %s",
            len(reader.current_chunk) if reader.current_chunk else 0,
            reader.current_exception,
        )
        continue
    print(record.title)

For pymarc-equivalent error handling, use permissive=True. Two documented differences from pymarc's defaults:

Encoding strictness: mrrc raises EncodingError (and swallows it via current_exception under permissive=True) on invalid UTF-8 in subfield values; pymarc applies lossy substitution silently. The shape of the iteration is unchanged (the bad record yields as None either way), so callers using except Exception: keep working.
current_chunk on byte-read errors: When the underlying read of the next record's bytes fails before parsing begins (truncated stream, I/O error), current_chunk may be None even though current_exception is set. For parse failures of fully-read chunks (the common case), current_chunk carries the full record bytes as pymarc does.

to_unicode Flag¶

pymarc's to_unicode=True (the default) converts MARC-8 encoded records to UTF-8. mrrc always converts MARC-8 to UTF-8 automatically — the conversion happens in the Rust parsing layer and cannot be disabled. The to_unicode kwarg is accepted for compatibility so existing scripts work unchanged. Passing to_unicode=False emits a warning but has no effect.

Recovery Mode (mrrc-specific)¶

mrrc also offers a recovery_mode kwarg that goes beyond pymarc's permissive mode. Instead of skipping bad records entirely, recovery mode attempts to salvage valid fields from damaged records:

# Attempt to recover partial data from malformed records
reader = mrrc.MARCReader('records.mrc', recovery_mode='lenient')
for record in reader:
    print(f"Got {len(record.get_fields())} fields")

# Even more lenient — accept partial data
reader = mrrc.MARCReader('records.mrc', recovery_mode='permissive')

Recovery modes: - "permissive" (default for the Python user surface) — yield records with diagnostics attached on record.errors; yield None for unsalvageable records - "lenient" — same shape as permissive with a tighter recovery cap; salvages valid fields - "strict" — raise on any malformation

Note: permissive=True and recovery_mode other than "strict" cannot be combined — they represent different error-handling strategies. Use permissive=True for pymarc-compatible "skip bad records" behavior, or recovery_mode for mrrc's "salvage what you can" approach. Setting permissive=True without an explicit recovery_mode implicitly pairs it with recovery_mode="strict" so the pymarc-shape combo (inner raises, outer wrapper swallows) works without surprise.

mrrc's Python default matches the pymarc / marc4j / libmarc convention: a fresh mrrc.MARCReader(file) iterates past per-record defects rather than aborting on the first one. The trade-off is real: permissive mode can hand you None for unsalvageable records, and per-record defects live on record.errors rather than raising. If you control the input and quality matters more than throughput, pass recovery_mode="strict" explicitly to make defects loud. See A gentle case for choosing strict when feasible.

Exception class names¶

mrrc keeps the same class names pymarc uses, so most except clauses work after a port with only the import line changing:

# pymarc
from pymarc import MARCReader, RecordDirectoryInvalid
# mrrc — same names, different package
from mrrc import MARCReader, RecordDirectoryInvalid

The full pymarc↔mrrc class-name mapping, the names mrrc deliberately omits (and why), and the per-variant attribute reference live in the Error handling reference. Two porting-specific notes worth inlining here:

Base class rename. from pymarc import PymarcException fails at import; replace with from mrrc import MrrcException, or alias on import:

from mrrc import MrrcException as PymarcException

FatalReaderError catches different things. mrrc keeps the fatal record-level classes (RecordLengthInvalid, TruncatedRecord, EndOfRecordNotFound) as siblings under MrrcException, not as children of FatalReaderError (as in pymarc). A port writing except FatalReaderError: to catch a malformed-record error won't catch what it expects. Two pymarc-compatible recipes:

# Enumerate the four classes by name (matches what pymarc's
# `except FatalReaderError:` would have caught)
try:
    record = next(reader)
except (RecordLengthInvalid, TruncatedRecord, EndOfRecordNotFound,
        FatalReaderError):
    ...

# Or catch the mrrc base (broader — catches every typed mrrc error)
try:
    record = next(reader)
except MrrcException:
    ...

See Known hierarchy divergences from pymarc in the reference for the rationale.

Capping recovered errors with max_errors. mrrc's MARCReader accepts a max_errors=N kwarg that caps the total number of record.errors entries accumulated across a lenient / permissive stream. Once the (N+1)-th recovered error lands, the next read raises mrrc.FatalReaderError (E099). pymarc has no equivalent. Pass max_errors=0 (or omit the kwarg) to disable the cap. See Capping recovered errors with max_errors.

Known Differences from pymarc¶

Record constructor: mrrc.Record() works (defaults to Leader()), or pass explicit mrrc.Record(mrrc.Leader())
UTF-8 encoding: Set leader.character_coding = 'a' for UTF-8 (mrrc uses UTF-8 by default internally)
No field removal during iteration: Use list comprehension or separate pass if modifying records during iteration
Type safety: All data is validated at Rust layer (this is a feature, prevents data corruption)
Field handles, not shared objects: fields obtained from a record are live handles — in-place edits persist exactly as in pymarc, but each lookup returns a distinct handle object (record['245'] is record['245'] is False), so don't compare fields with is or id()
Removal invalidates field handles: after any remove_field/remove_fields call, outstanding handles raise mrrc.StaleFieldError on use instead of silently targeting the wrong field — re-fetch the field and retry (pymarc object references survive removals)

Getting Help¶

Documentation: See class docstrings in Python (IDE autocomplete available)
Type hints: Full .pyi stub file provides IDE support
Examples: See test files for comprehensive examples
Issues: Report bugs at https://github.com/dchud/mrrc/issues

Contributing¶

We welcome contributions! The project is structured as: - src/: Core Rust MARC library - src-python/: Python wrapper with PyO3 - tests/: Integration tests

To build locally:

uv sync
uv run maturin develop