Migration Guide: pymarc to mrrc¶
This guide helps existing pymarc users migrate to mrrc (MARC Rust Crate), a high-performance Rust-based MARC library with Python bindings.
Overview¶
mrrc is a Rust-based MARC library with Python bindings, providing:
- High performance Rust implementation with Python convenience
- Full pymarc API compatibility - Drop-in replacement for existing pymarc code
- Type-safe design with comprehensive error handling
- Native Python integration through PyO3 bindings with familiar data structures
- All standard MARC operations including reading, writing, and format conversions
Installation¶
Quick Start¶
Before (pymarc)¶
import pymarc
# Reading records
with open('records.mrc', 'rb') as f:
reader = pymarc.MARCReader(f)
for record in reader:
print(record['245']['a'])
# Writing records
writer = pymarc.MARCWriter(open('output.mrc', 'wb'))
field = pymarc.Field('245', ['1', '0'], [('a', 'Title')])
record = pymarc.Record(to_utf8=True)
record.append(field)
writer.write(record)
writer.close()
After (mrrc) - pymarc-Compatible¶
mrrc supports nearly identical pymarc syntax:
import mrrc
# Reading records — pass a file path for best performance.
# Path input uses Rust-native file I/O, which releases the GIL
# and enables true multi-thread parallelism.
reader = mrrc.MARCReader('records.mrc')
for record in reader:
print(record['245']['a']) # pymarc dictionary syntax works!
print(record.title()) # Also available as convenience method
# Writing records - inline construction (similar to pymarc)
with open('output.mrc', 'wb') as f:
with mrrc.MARCWriter(f) as writer:
record = mrrc.Record(fields=[
mrrc.Field('245', indicators=['1', '0'], subfields=[
mrrc.Subfield('a', 'Title'),
]),
])
writer.write(record)
API Comparison¶
Record Creation¶
| Operation | pymarc | mrrc | Same? |
|---|---|---|---|
| Create empty record | pymarc.Record() |
mrrc.Record() |
Same |
| Create with leader | pymarc.Record(leader) |
mrrc.Record(leader) |
Same |
| Add control field | record.add_field(Field('001', data='value')) |
record.add_control_field('001', 'value') |
Different |
| Add data field | record.append(field) |
record.add_field(field) |
Different |
Field Creation¶
| Operation | pymarc | mrrc | Same? |
|---|---|---|---|
| Create field | Field('245', ['1','0'], [('a', 'Title')]) |
Field('245', indicators=['1','0'], subfields=[Subfield('a', 'Title')]) |
Similar |
| Add subfield | field.add_subfield('a', 'value') |
field.add_subfield('a', 'value') |
Same |
| Get subfields | field.get_subfields('a') |
field.get_subfields('a') |
Same |
| Access subfield | field['a'] |
field['a'] |
Same |
Reading/Writing¶
| Operation | pymarc | mrrc | Same? |
|---|---|---|---|
| Create reader | MARCReader(file_obj) |
MARCReader('path.mrc') (recommended) or MARCReader(file_obj) |
Enhanced |
| Read record | reader.next() or next(reader) |
next(reader) |
Same |
| Write record | writer.write(record) |
writer.write(record) |
Same |
| Iterate | for record in reader: |
for record in reader: |
Same |
| Context manager | Manual close required | with MARCWriter(f) as w: |
Enhanced |
Accessing Data¶
| Operation | pymarc | mrrc | Same? |
|---|---|---|---|
| Get title | record['245']['a'] |
record['245']['a'] or record.title() |
Same |
| Get field | record['650'] |
record['650'] or record.fields_by_tag('650') |
Same |
| Check if field exists | '245' in record |
'245' in record |
Same |
| Get all fields | for field in record: |
for field in record: |
Same |
| Control field | record['001'].value |
record['001'].value or record.control_field('001') |
Same |
API Compatibility¶
mrrc provides excellent pymarc API compatibility with support for all major operations:
Record Field Access - Dictionary-Style (Identical to pymarc)¶
# Dictionary-style access works exactly like pymarc
field = record['245'] # Get first 245 field (or None if missing)
all_fields = record.fields_by_tag('245') # Get all 245 fields
# Missing fields return None (matching pymarc behavior)
field = record['999'] # Returns None if field doesn't exist
# (does NOT raise KeyError)
# Check if field exists (identical to pymarc)
if '245' in record:
title_field = record['245']
# Dict-like .get() method (identical to pymarc)
field = record.get('245') # Get first field, None if missing
field = record.get('999', default_field) # With default value
# Alternative method-based access also available
field = record.get_field('245') # Get first field
Field Subfield Access - Dictionary-Style (Identical to pymarc)¶
# Dictionary-style access works exactly like pymarc
title = field['a'] # Get first 'a' subfield
if 'a' in field:
value = field['a']
# Missing subfields return None (matching pymarc behavior)
value = field['z'] # Returns None if subfield doesn't exist
# (does NOT raise KeyError)
# Get all values for a code
all_subfields = field.get_subfields('a') # Get list of 'a' subfield values
# Iterate over all subfields
for subfield in field.subfields():
print(f"{subfield.code}: {subfield.value}")
# Get subfields as dictionary
subfield_dict = field.subfields_as_dict()
Field Operations (Identical to pymarc)¶
field.add_subfield('a', 'value') # Identical to pymarc
field.get_subfields('a') # Get list of values - identical to pymarc
field.delete_subfield('a') # Delete subfield by code
field.subfields_as_dict() # Get all subfields as dict
field.subfields() # Get all Subfield objects
field.is_control_field() # False for data fields (identical to pymarc)
Record Operations (Identical to pymarc + Extensions)¶
# Standard pymarc operations
record.remove_field('245') # Remove field(s) by tag
record.append(field) # Add field (same as add_field for compatibility)
record.get_fields('650', '651') # Get fields for multiple tags
# Convenience methods (identical to pymarc)
record.title() # Get title (245 $a)
record.author() # Get author (100/110/111 $a)
record.isbn() # Get ISBN (020 $a)
record.issn() # Get ISSN (022 $a)
record.subjects() # Get all subjects (6XX $a)
record.publisher() # Get publisher (260 $b)
record.physical_description() # Get extent (300 $a)
record.series() # Get series (490 $a)
Leader Access - Property-Based and Position-Based¶
# Property-based access (recommended for clarity)
leader = record.leader()
leader.record_status = 'c' # Set record status
leader.record_type = 'a' # Set record type
leader.bibliographic_level = 'd' # Set bibliographic level
# Position-based access (also available for pymarc compatibility)
leader[5] = 'c' # Set record status at position 5
leader[6] = 'a' # Set record type at position 6
# Slice access to get multiple positions
record_length = int(leader[0:5]) # Get first 5 chars (record length)
cataloging_form = leader[18] # Get cataloging form char at position 18
# Position and property access are automatically synchronized
leader.record_status = 'd'
assert leader[5] == 'd' # Position-based access reflects property change
Reader/Writer Interface¶
# Reading — pass a path string or pathlib.Path for best performance.
# This uses Rust-native file I/O, which releases the Python GIL during
# parsing and enables true multi-thread parallelism.
reader = mrrc.MARCReader('records.mrc')
for record in reader: # Standard iteration
print(record.title())
# Python file objects and in-memory bytes also work, but hold the GIL
# during reads, so they won't benefit from multi-threading.
with open('records.mrc', 'rb') as f:
reader = mrrc.MARCReader(f) # Works, but slower under threading
reader = mrrc.MARCReader(marc_bytes) # Also works for in-memory data
# Writing (identical to pymarc, with context manager support)
with mrrc.MARCWriter(f) as writer:
writer.write(record) # Same method name as pymarc
Minimal API Differences¶
mrrc is nearly 100% compatible with pymarc. Here are the only two required changes:
1. Record Constructor¶
Record() now works with no arguments (leader defaults to Leader()):
# pymarc
record = pymarc.Record()
# mrrc - both work
record = mrrc.Record() # Default leader
record = mrrc.Record(mrrc.Leader()) # Explicit leader
# Note: Once created, all field access works identically
print(record['245']['a']) # Works exactly like pymarc
2. Optional: Extended Convenience Methods¶
mrrc extends pymarc with additional convenience methods:
# All pymarc methods work:
record.title() # Get title
record.author() # Get author
record.isbn() # Get ISBN
# Plus many additional methods:
record.issn() # Get ISSN
record.issn_title() # Get ISSN title
record.sudoc() # Get SuDoc classification
record.issnl() # Get ISSN-L
record.pubyear() # Get publication year
record.physical_description() # Get extent/pages
record.is_book() # Check if book
record.is_serial() # Check if serial
record.is_music() # Check if music
Migration Checklist¶
Minimal changes needed:
- [ ] Replace
import pymarcwithimport mrrc - [ ] Update record creation:
pymarc.Record()→mrrc.Record()(ormrrc.Record(mrrc.Leader())) - [ ] Update field creation to use
indicators=andsubfields=kwargs if desired - [ ] Everything else works the same - dictionary access, method names, iteration all identical
Optional enhancements:
- [ ] Pass file paths to
MARCReader('file.mrc')instead of file objects (releases the GIL, enables multi-thread parallelism) - [ ] Use additional convenience methods like
record.issn(),record.sudoc(), etc. for specialized use cases - [ ] Update writers to use context managers:
with mrrc.MARCWriter(f) as w:(better resource management)
Known Differences from pymarc¶
- Record constructor:
mrrc.Record()works (defaults toLeader()), or pass explicitmrrc.Record(mrrc.Leader()) - UTF-8 encoding: Set
leader.character_coding = 'a'for UTF-8 (mrrc uses UTF-8 by default internally) - No field removal during iteration: Use list comprehension or separate pass if modifying records during iteration
- Type safety: All data is validated at Rust layer (this is a feature, prevents data corruption)
Getting Help¶
- Documentation: See class docstrings in Python (IDE autocomplete available)
- Type hints: Full
.pyistub file provides IDE support - Examples: See test files for comprehensive examples
- Issues: Report bugs at https://github.com/dchud/mrrc/issues
Contributing¶
We welcome contributions! The project is structured as:
- src/: Core Rust MARC library
- src-python/: Python wrapper with PyO3
- tests/: Integration tests
To build locally: