pymarc API Surface and Compatibility Audit - mrrc-9ic.7¶

Date: 2025-12-28
Source: pymarc v5.3.1 (Python 3.9+)
Task: Document pymarc's public API and create compatibility matrix for mrrc wrapper

Executive Summary¶

pymarc provides a well-established MARC21 handling library with clear API patterns. The mrrc Rust wrapper should target functional compatibility with pymarc's core APIs while providing improved performance and modern Python integration.

Key Compatibility Targets: - ✓ Record - Core record type with field access and helper properties - ✓ Field - Represents MARC data and control fields - ✓ Leader - Mutable leader with property-based access - ✓ MARCReader - Iterator-based reading (binary format) - ✓ MARCWriter - Sequential writing of records - ⚠️ Format Converters - JSON, XML, MARCXML (lower priority)

1. Record Class API¶

Constructor¶

Record(data: str = '', to_unicode: bool = True, force_utf8: bool = False,
       hide_utf8_warnings: bool = False, utf8_handling: str = 'strict',
       leader: str = ' ', file_encoding: str = 'iso8859-1')

Parameters: - data - Optional: binary MARC21 record data for parsing - to_unicode - Convert to Unicode (default: True) - force_utf8 - Force UTF-8 interpretation (default: False) - hide_utf8_warnings - Suppress encoding warnings (default: False) - utf8_handling - Error handling strategy: 'strict', 'replace', 'ignore' - leader - Optional: custom leader string (default: all spaces) - file_encoding - Source encoding (default: 'iso8859-1')

mrrc Equivalent: Record::new(leader: Leader) ✓

Field Manipulation Methods¶

Add Field(s)¶

record.add_field(*fields: Field) -> None
record.add_grouped_field(*fields: Field) -> None  # Maintains loose numeric order

mrrc Equivalent: record.add_field(field: Field) ✓

Remove Field(s)¶

record.remove_field(*fields: Field) -> None       # Remove specific field instances
record.remove_fields(*tags: str) -> None          # Remove all fields with given tags

mrrc Equivalent: Not directly supported, but can be implemented ⚠️

Field Access Methods¶

Direct Field Access (Dict-like)¶

record['245']              # Returns Field or None
record['245']['a']         # Returns subfield value (string) or None
record.get_field(tag)      # Same as record[tag]
record.get_fields(*tags)   # Returns List[Field] matching any tag

Quirk: Accessing non-existent field returns None (no KeyError)
mrrc Equivalent: record.get_field(tag), record.get_fields(tag) ✓

Iteration¶

for field in record:       # Iterate through all fields
for field in record.get_fields('650'):  # Iterate through specific tag

mrrc Equivalent: Iterator trait implementation ✓

Record Properties (Convenience Methods)¶

Property	Source	Returns	mrrc Support
`record.leader`	24-byte header	Leader object	✓ Required
`record.title`	245 $a, $b	str \| None	✓ EXCELLENT
`record.author`	100/110/111	str \| None	✓ Supported
`record.isbn`	020 $a (cleaned)	str \| None	✓ Supported
`record.issn`	022 $a	str \| None	✓ Supported
`record.issn_title`	222 $a, $b	str \| None	⚠️ Optional
`record.issnl`	022 $l	str \| None	⚠️ Optional
`record.publisher`	260/264	str \| None	✓ Supported
`record.pubyear`	260/264	str \| None	✓ Supported
`record.subjects`	650+	List[Field]	✓ Supported
`record.series`	490/8xx	List[Field]	⚠️ Optional
`record.notes`	5xx	List[Field]	✓ Supported
`record.location`	852	List[Field]	⚠️ Optional
`record.physicaldescription`	300	List[Field]	⚠️ Optional
`record.uniformtitle`	130/240	str \| None	⚠️ Optional
`record.sudoc`	086	str \| None	⚠️ Optional

mrrc Status: All MUST-HAVE properties already implemented ✓

Serialization Methods¶

record.as_marc() -> bytes              # Binary MARC21 format
record.as_marc21() -> bytes            # Alias for as_marc()
record.as_json(indent: int = None) -> str  # JSON representation

mrrc Equivalent: record.encode(), record.to_json() ✓

Parsing Methods¶

record.decode_marc(marc, to_unicode=True, force_utf8=False,
                   hide_utf8_warnings=False, utf8_handling='strict',
                   encoding='iso8859-1') -> None

Behavior: Populates record from binary MARC data (destructive)
mrrc Equivalent: Record::from_bytes(data) or parse in constructor ✓

2. Field Class API¶

Constructor¶

Field(tag: str, indicators: Indicators | None = None,
      subfields: List[Subfield] | None = None, data: str | None = None)

Parameters: - tag - 3-digit field tag (e.g., '245') - indicators - Indicators(ind1, ind2) for data fields, None for control fields - subfields - List[Subfield] with code/value pairs - data - String data for control fields (mutually exclusive with subfields)

mrrc Equivalent: Field::new(tag, ind1, ind2) ✓

Subfield Methods¶

field.add_subfield(code: str, value: str, pos: int = None) -> None
field.get_subfields(*codes: str) -> List[str]  # Values only
field.get(code: str, default = None) -> str    # Single subfield (dict-like)
field[code] -> str                              # Same as .get(), but KeyError if missing

mrrc Equivalent: field.add_subfield(code, value), field.get_subfield(code) ✓

Field Properties¶

field.tag -> str                          # Field tag
field.indicators -> Indicators            # Indicators (Indicators(ind1, ind2))
field.subfields -> List[Subfield]        # All subfields
field.control_field -> bool              # True if control field

mrrc Equivalent: All accessible via public fields/methods ✓

Field Methods¶

field.format_field() -> str              # Pretty-printed field (spaces, formatting)
field.value() -> str                     # Raw subfield data (no markers)
field.as_json() -> dict                  # JSON representation
field.as_marc(encoding: str) -> bytes    # Binary MARC field format

mrrc Equivalent: field.to_string(), field.to_json() ✓

Field Access Patterns¶

Control Fields (tags 001-009):

record['001'].value()  # Returns string data
record['001'].data     # Direct data access

Data Fields (tags 010+):

record['245']['a']                    # Single subfield
record['245'].get_subfields('a', 'b') # Multiple subfields as list

Quirk: Subfield access returns None if not present (no KeyError)

3. Leader Class API¶

Constructor¶

Leader(leader: str)  # 24-character leader string

Position-Based Access¶

leader[0:5]        # Record length (computed in as_marc())
leader[5]          # Record status
leader[7]          # Bibliographic level
leader[18]         # Cataloging form

Property-Based Access¶

Property	Position	Description
`record_length`	0-4	Computed during serialization
`record_status`	5	'a', 'c', 'd', 'n', 'p'
`bibliographic_level`	7	'a', 'b', 'c', 'd', 'i', 'm', 's'
`coding_scheme`	9	' ' (MARC-8) or 'a' (UTF-8)
`indicator_count`	10	Usually '2'
`subfield_code_length`	11	Usually '2'
`base_address`	12-16	Computed during serialization
`encoding_level`	17	'1', '2', '3', 'u', 'z', ' '
`cataloging_form`	18	'a' (AACR2), 'c' (ISBD), etc.
`multipart_resource`	19	' ' or 'a', 'b', 'c'
`length_of_field_length`	20	Usually '4'
`length_of_starting_char_pos`	21	Usually '5'
`implementation_defined_length`	22	Usually '0'

mrrc Equivalent: Leader structure with position/property access ✓

Mutability¶

leader.record_status = 'a'    # Set property
leader[5] = 'a'               # Set by position

mrrc Equivalent: Leader is mutable in mrrc ✓

4. MARCReader Class API¶

Constructor¶

MARCReader(marc_target: BinaryIO | bytes,
           to_unicode: bool = True,
           force_utf8: bool = False,
           hide_utf8_warnings: bool = False,
           utf8_handling: str = 'strict',
           file_encoding: str = 'iso8859-1',
           permissive: bool = False)

Parameters: - marc_target - File handle or bytes to read from - to_unicode - Convert to Unicode (default: True) - force_utf8 - Force UTF-8 interpretation - utf8_handling - 'strict' (raise), 'replace', 'ignore' - file_encoding - Source encoding for non-UTF8 records - permissive - Continue on errors (return None) instead of raising

Iterator Interface¶

reader = MARCReader(file_handle)
for record in reader:        # Iterate through records
    print(record.title())

# Or manual iteration
record = next(reader)        # Get next record or raise StopIteration

mrrc Equivalent: Iterator trait implementation ✓

Methods and Properties¶

reader.close() -> None                  # Close file handle
reader.file_handle -> IO               # Underlying file object

# In permissive mode:
reader.current_exception -> Exception  # Last exception encountered
reader.current_chunk -> bytes           # Data chunk that caused error

mrrc Equivalent: Reader::new(), iterator, error handling ✓

Quirks¶

Permissive mode: Returns None instead of raising on malformed records
Automatic encoding detection: Analyzes leader byte 9 for charset
File handle remains open: Caller must close file

5. MARCWriter Class API¶

Constructor¶

MARCWriter(file_handle: IO)

Parameters: - file_handle - Open file handle in binary write mode ('wb')

Methods¶

writer.write(record: Record) -> None     # Write single record
writer.close(close_fh: bool = True) -> None  # Finalize writer

Parameters for close(): - close_fh - Close underlying file handle (default: True)

Usage Pattern:

writer = MARCWriter(open('output.mrc', 'wb'))
for record in records:
    writer.write(record)
writer.close()  # Important!

mrrc Equivalent: Writer::new(), write_record(), close() ✓

Quirks¶

File is not auto-closed; must call writer.close()
If close_fh=False, caller must close underlying file
No validation of records before writing

6. Format Converters (Optional)¶

JSONReader¶

JSONReader(marc_target: bytes | str, encoding: str = 'utf-8', stream: bool = False)

Supports: pymarc JSON format (see below)

JSONWriter¶

# Via Record.as_json()
json_string = record.as_json(indent=2)

XMLReader/XMLWriter¶

# Parse MARCXML
from pymarc import parse_xml_to_array
records = parse_xml_to_array('file.xml')

# Or streaming
from pymarc import map_xml
map_xml(process_function, 'file.xml')

# Write MARCXML
from pymarc import XMLWriter
writer = XMLWriter(open('output.xml', 'wb'))
writer.write(record)
writer.close()  # Important!

MARCMakerReader¶

# Read MARCMaker text format
reader = MARCMakerReader(open('file.mrk', 'r'))
for record in reader:
    print(record)

TextWriter¶

# Write readable MARCMaker text format
from pymarc import TextWriter
writer = TextWriter(open('output.txt', 'w'))
writer.write(record)

mrrc Status: All format converters implemented ✓

Compatibility Matrix¶

MUST-HAVE (Core Functionality)¶

Feature	pymarc	mrrc	Status
Record creation	`Record()`	`Record::new()`	✓ DONE
Field access	`record[tag]`	`record.get_field()`	✓ DONE
Subfield access	`field[code]`	`field.get_subfield()`	✓ DONE
Add fields	`record.add_field()`	`record.add_field()`	✓ DONE
Leader access	`record.leader`	`record.leader`	✓ DONE
Read MARC21	`MARCReader()`	`Reader::new()`	✓ DONE
Write MARC21	`MARCWriter()`	`Writer::new()`	✓ DONE
Title property	`record.title`	`record.title()`	✓ DONE
ISBN property	`record.isbn`	`record.isbns()`	✓ DONE
Subjects property	`record.subjects`	Query helpers	✓ DONE
JSON serialization	`record.as_json()`	`to_json()`	✓ DONE
XML serialization	`XMLWriter`	`to_xml()`	✓ DONE

NICE-TO-HAVE (Enhanced Features)¶

Feature	pymarc	mrrc	Status
Permissive mode	`MARCReader(permissive=True)`	Recovery mode	✓ EXCELLENT
Field removal	`record.remove_fields()`	Manual removal	⚠️ PARTIAL
Property-based leader	`leader.record_status`	Position-based	✓ GOOD
Format conversion	JSON, XML, MARCJSON	JSON, XML, MARCJSON, CSV, Dublin Core, MODS	✓ SUPERIOR
Query DSL	Field object access	Multiple query types	✓ SUPERIOR
Error hierarchy	Generic Exception	MarcError variants	✓ SUPERIOR
Type hints	Minimal	Full Python stubs (.pyi)	✓ SUPERIOR
Performance	Baseline	2-5x faster (Rust)	✓ SUPERIOR

DEPRECATED/DIFFERENT¶

Feature	pymarc	mrrc	Notes
Legacy subfields	List[str]	Subfield struct	pymarc v5+ uses Subfield
to_unicode parameter	Boolean flag	Automatic UTF-8	mrrc auto-detects
Authority records	Not explicit	AuthorityRecord type	mrrc has specialized types
Holdings records	Not explicit	HoldingsRecord type	mrrc has specialized types

Python API Patterns to Match¶

Pattern 1: Dict-Like Field Access¶

# pymarc
record['245']              # Return Field or None (no KeyError)
record['245']['a']         # Return subfield value or None

# mrrc Python wrapper should support:
record['245']              # Via __getitem__
record['245']['a']         # Via Field.__getitem__

Pattern 2: Iterator Protocol¶

# pymarc
for record in reader:
    print(record.title())

# mrrc Python wrapper should support:
for record in reader:      # Via __iter__
    print(record.title())

Pattern 3: Property Access¶

# pymarc
record.title               # Property
record.author              # Property
record.isbn                # Property

# mrrc Python wrapper should support:
record.title()             # Method or property
record.author()            # Method or property
record.isbn()              # Method or property

Pattern 4: Context Manager (Best Practice)¶

# Recommended for readers/writers
with MARCReader(open('file.mrc', 'rb')) as reader:
    for record in reader:
        print(record.title())

# pymarc doesn't enforce this, but it's a best practice

Recommendation: Implement __enter__ and __exit__ for mrrc wrappers

Critical Implementation Notes¶

1. None vs KeyError¶

pymarc behavior: Accessing non-existent field returns None, not KeyError

record['245']          # Returns Field or None
record['999']          # Returns None (no error)
record['245']['z']     # Returns None (subfield doesn't exist)

mrrc should match this behavior for compatibility

2. Encoding Handling¶

pymarc supports: - MARC-8 encoding (leader byte 9 = ' ') - UTF-8 encoding (leader byte 9 = 'a') - Configurable fallback encoding

mrrc supports: - MARC-8 with extensive escape sequence handling - UTF-8 - Auto-detection based on leader

Status: ✓ mrrc is more robust

3. Permissive Mode¶

pymarc: MARCReader(permissive=True) returns None on malformed records

mrrc: Reader with recovery mode handles truncated/corrupted data

Status: ✓ mrrc is more capable

4. Leader Mutability¶

pymarc: Leader properties are mutable

mrrc: Leader is mutable

Status: ✓ Compatible

5. Field Iteration¶

pymarc: for field in record: iterates all fields

mrrc: Supports iteration via fields(), get_fields(tag)

Status: ✓ Compatible

Recommended Python Wrapper Implementation¶

Priority 1: Must Match¶

Record creation and field access (dict-like)
MARCReader iterator interface
MARCWriter sequential writing
Record title/author/isbn properties
Subfield access patterns
Leader access (both property and position-based)
Exception handling (custom Python exceptions)

Priority 2: Nice-to-Have¶

Context manager support (with statements)
Remove field functionality
Property-based leader access (in addition to position-based)
Additional helper properties (series, notes, location)
Format conversion methods (JSON, XML)

Priority 3: Future/Optional¶

Batch operations (map_records)
Alternative readers (MARCMaker, JSON)
Alternative writers (XML, Text, JSON)
Query DSL exposure (optional, not in pymarc)

Testing Strategy¶

Unit Tests Needed¶

# test_record.py
- Create empty record
- Add/access/remove fields
- Access subfields
- Get properties (title, author, isbn)
- Serialize to binary

# test_reader.py
- Read valid MARC file
- Iterate through records
- Handle EOF
- Permissive mode (malformed data)

# test_writer.py
- Write records to binary
- Round-trip test (write → read → compare)
- Close behavior

# test_field.py
- Create data fields (with indicators)
- Create control fields (no indicators)
- Add/access subfields
- Get subfield lists

# test_leader.py
- Parse from binary
- Property access
- Position-based access
- Mutation

Integration Tests Needed¶

# test_compatibility.py
- Compare mrrc results with pymarc on same input files
- Verify title extraction matches
- Verify field counts match
- Verify binary round-trip matches

Summary for Implementation¶

Compatibility Target: Provide drop-in replacement for pymarc in common use cases

Core APIs to wrap (Priority 1): - ✓ Record - with dict-like field access - ✓ Field - with subfield access - ✓ Leader - with mutable properties - ✓ MARCReader - with iterator interface - ✓ MARCWriter - with sequential writing

Quality metrics: - Dict-like None behavior (not KeyError) ← Critical - Iterator protocol support ← Critical - Property-based convenience methods ← Important - Exception hierarchy ← Important - Context manager support ← Nice-to-have

Timeline: Focus on Priority 1 for Phase 1-2, Priority 2 for Phase 4

pymarc API Surface and Compatibility Audit - mrrc-9ic.7¶

Executive Summary¶

1. Record Class API¶

Constructor¶

Field Manipulation Methods¶

Add Field(s)¶

Remove Field(s)¶

Field Access Methods¶

Direct Field Access (Dict-like)¶

Iteration¶

Record Properties (Convenience Methods)¶

Serialization Methods¶

Parsing Methods¶

2. Field Class API¶

Constructor¶

Subfield Methods¶

Field Properties¶

Field Methods¶

Field Access Patterns¶

3. Leader Class API¶

Constructor¶

Position-Based Access¶

Property-Based Access¶

Mutability¶

4. MARCReader Class API¶

Constructor¶

Iterator Interface¶

Methods and Properties¶

Quirks¶

5. MARCWriter Class API¶

Constructor¶

Methods¶

Quirks¶

6. Format Converters (Optional)¶

JSONReader¶

JSONWriter¶

XMLReader/XMLWriter¶

MARCMakerReader¶

TextWriter¶

Compatibility Matrix¶

MUST-HAVE (Core Functionality)¶

NICE-TO-HAVE (Enhanced Features)¶

DEPRECATED/DIFFERENT¶

Python API Patterns to Match¶

Pattern 1: Dict-Like Field Access¶

Pattern 2: Iterator Protocol¶

Pattern 3: Property Access¶

Pattern 4: Context Manager (Best Practice)¶

Critical Implementation Notes¶

1. None vs KeyError¶

2. Encoding Handling¶

3. Permissive Mode¶

4. Leader Mutability¶

5. Field Iteration¶

Recommended Python Wrapper Implementation¶

Priority 1: Must Match¶

Priority 2: Nice-to-Have¶

Priority 3: Future/Optional¶

Testing Strategy¶

Unit Tests Needed¶

Integration Tests Needed¶

Summary for Implementation¶

References¶