Skip to content

Error Handling

mrrc raises a typed exception hierarchy with structured positional metadata on every error: where in the byte stream the problem occurred, which record it came from, the 001 control number, the field/subfield being parsed, and the source filename when known. The class names and parent relationships match pymarc's exception layer, so code written against pymarc's exception classes catches the same conditions in mrrc unchanged.

Exception hierarchy

Exception
├── MrrcException                      (base)
│   ├── RecordLengthInvalid
│   ├── RecordLeaderInvalid
│   ├── BaseAddressInvalid
│   ├── BaseAddressNotFound
│   ├── RecordDirectoryInvalid
│   │   ├── InvalidIndicator    (mrrc)
│   │   ├── BadSubfieldCode     (mrrc)
│   │   └── InvalidField        (mrrc)
│   ├── EndOfRecordNotFound
│   │   └── TruncatedRecord     (mrrc)
│   ├── FieldNotFound
│   ├── FatalReaderError
│   ├── EncodingError           (mrrc)
│   ├── XmlError                (mrrc)
│   ├── JsonError               (mrrc)
│   ├── WriterError             (mrrc)
│   └── StaleFieldError         (mrrc)
└── OSError
    └── PyIOError                      (Python built-in, raised on I/O failure)

class BadSubfieldCodeWarning(UserWarning)

Classes marked (mrrc) are mrrc-specific subclasses that pymarc does not have. Each one extends the closest pymarc parent so existing pymarc-style except clauses keep catching the same conditions.

Choosing what to catch

You want to… Catch
Match pymarc's catch behavior exactly The pymarc-named class (RecordDirectoryInvalid, EndOfRecordNotFound, etc.) — mrrc-specific subclasses are caught too.
Distinguish indicator errors from subfield errors InvalidIndicator and BadSubfieldCode separately.
Catch every mrrc error, no matter the variant MrrcException.
Catch only I/O errors OSError (or its IOError alias).
Handle a field handle invalidated by removals StaleFieldError — re-fetch the field from the record and retry. Raised by live field handles (see Field handles) after any remove_field/remove_fields call; it is a usage error, not a data error, so it carries no E-code.

Pymarc exception compatibility

This page covers exception class names, hierarchy, and catch behavior only. The new positional attributes are additive: pymarc-style code that inspects only str(err) keeps working without change.

Other compatibility surfaces — record APIs, reader/writer constructor shapes, format coverage, and performance characteristics — are out of scope for this page; consult the Python API reference and Rust API reference for those.

Exception name mapping

pymarc class mrrc class Notes
PymarcException MrrcException Same role; alias if desired (see below).
RecordLengthInvalid RecordLengthInvalid Same name; gains positional attrs.
RecordLeaderInvalid RecordLeaderInvalid Same name; gains positional attrs.
BaseAddressInvalid BaseAddressInvalid Same name; gains positional attrs.
BaseAddressNotFound BaseAddressNotFound Same name; gains positional attrs.
RecordDirectoryInvalid RecordDirectoryInvalid Same name; gains positional attrs. Also catches new mrrc subclasses InvalidIndicator, BadSubfieldCode, InvalidField.
EndOfRecordNotFound EndOfRecordNotFound Same name; gains positional attrs. Also catches new subclass TruncatedRecord.
FieldNotFound FieldNotFound Same name; gains record_control_number, record_index.
FatalReaderError FatalReaderError Same name; reserved for catastrophic states.
BadSubfieldCodeWarning BadSubfieldCodeWarning Same name (UserWarning, not exception).
IOError / OSError OSError (via PyIOError) I/O errors map to Python's built-in.

Pymarc names mrrc deliberately omits

The following pymarc classes are intentionally absent in mrrc. Each row gives the rationale and the mrrc-equivalent behavior a port should rely on instead.

pymarc class why mrrc doesn't have it mrrc-equivalent behavior
NoFieldsFound An empty Record is a valid in-memory state in mrrc; no exception is raised. Check record.get_fields() length.
WriteNeedsRecord MARCWriter.write_record is type-annotated; passing a non-Record is a static-type error. Static type check (pyright / mypy).
NoActiveFile MARCWriter is context-managed; operating on a closed writer raises plain RuntimeError. Use a with block or check writer state.
BadLeaderValue mrrc.Leader validates fields at construction. Bad values raise ValueError.
MissingLinkedFields 880-linkage validation isn't part of the parser. Validate links in caller code.

Optional symbol-level aliases

For projects swapping pymarc imports to mrrc and wanting from pymarc import RecordLeaderInvalid-style imports to keep working:

from mrrc import MrrcException as PymarcException
from mrrc import (
    RecordLengthInvalid,
    RecordLeaderInvalid,
    BaseAddressInvalid,
    BaseAddressNotFound,
    RecordDirectoryInvalid,
    EndOfRecordNotFound,
    FieldNotFound,
    FatalReaderError,
    BadSubfieldCodeWarning,
)

The catch hierarchy behaves the same as in pymarc. Code outside the exception layer (record manipulation, reader/writer APIs, format I/O) may still need changes; consult the Python API reference.

What you gain on the exception layer

Three patterns, in order of effort:

Same except, more context. Existing pymarc-style code keeps working. The same except clause now also gets structured attributes:

try:
    for record in mrrc.MARCReader(open("harvest.mrc", "rb")):
        ...
except mrrc.RecordDirectoryInvalid as e:
    log.warning(
        "directory error in record %d (001=%s, field %s) at byte 0x%X",
        e.record_index, e.record_control_number, e.field_tag, e.byte_offset,
    )

Opt-in granularity. mrrc-aware code can catch the new subclasses directly to make decisions on the specific error kind:

try:
    ...
except mrrc.InvalidIndicator as e:
    log.warning(
        "Bad indicator at field %s ind%d in record %d",
        e.field_tag, e.indicator_position, e.record_index,
    )
except mrrc.BadSubfieldCode as e:
    log.warning("Bad subfield code 0x%02X at field %s", e.subfield_code, e.field_tag)

Diagnostic dump. The detailed() method produces a multi-line diagnostic suitable for logs:

try:
    ...
except mrrc.MrrcException as e:
    log.error(e.detailed())
InvalidIndicator at record 847, field 245
  source:          harvest.mrc
  001:             ocm01234567
  indicator 1:     found b':', expected digit or space
  byte offset:     0x1C31 (7217) in stream
  record-relative: byte 42

Subclass behavior reference

If you except this class… …you also catch these mrrc-specific subclasses
RecordDirectoryInvalid InvalidIndicator, BadSubfieldCode, InvalidField
EndOfRecordNotFound TruncatedRecord
MrrcException All mrrc-specific exceptions
OSError PyIOError (I/O failures)

MARCReader.current_exception / current_chunk

mrrc's MARCReader exposes pymarc-compatible current_exception and current_chunk attributes. After each __next__ step:

  • reader.current_chunk holds the raw bytes of the record just read from the source (declared length per the leader). Set on every successful chunk read regardless of whether the parse step then succeeded or failed.
  • reader.current_exception holds the typed MrrcException swallowed by permissive=True, or None on a clean read.
reader = mrrc.MARCReader("harvest.mrc", permissive=True)
for record in reader:
    if record is None:
        log.warning(
            "skipped malformed record (%d bytes): %s",
            len(reader.current_chunk) if reader.current_chunk else 0,
            reader.current_exception,
        )
        continue
    process(record)

Two documented divergences from pymarc:

  • Encoding strictness. mrrc raises EncodingError on invalid UTF-8 in subfield values (swallowed via current_exception under permissive=True); pymarc applies lossy substitution silently. The iteration shape is identical (the bad record yields as None either way), so callers using except Exception: keep working.
  • current_chunk on byte-read errors. When the underlying read of the next record's bytes fails before parsing begins (truncated stream, I/O error), current_chunk may be None even though current_exception is set. For parse failures of fully-read chunks (the common case), current_chunk carries the full record bytes as pymarc does.

Known hierarchy divergences from pymarc

mrrc's exception class names match pymarc's, but two relationships in the class tree differ. Existing except clauses written against a specific class name (except RecordDirectoryInvalid:, except EndOfRecordNotFound:, etc.) work in mrrc unchanged. The divergences only matter for code that catches a parent class.

FatalReaderError parentage. In pymarc, FatalReaderError is the parent of RecordLengthInvalid, TruncatedRecord, and EndOfRecordNotFound; a pymarc loop can except FatalReaderError: to catch any of those four. In mrrc, FatalReaderError is a sibling (reserved for the specific "recovered-error cap exceeded" case under recovery_mode="lenient"/"permissive" with with_max_errors). except FatalReaderError: in mrrc therefore catches only the cap-exhausted case, not the malformed-record cases. To match pymarc's catch surface, either enumerate the four classes —

except (RecordLengthInvalid, TruncatedRecord, EndOfRecordNotFound,
        FatalReaderError):
    ...

— or catch the mrrc base, which is broader (every typed mrrc error):

except MrrcException:
    ...

PymarcExceptionMrrcException. The base class name differs. from pymarc import PymarcException fails at import; replace with from mrrc import MrrcException (or alias on import — see Optional symbol-level aliases below).

Per-variant field reference

Each exception class accepts the following keyword arguments at construction time (all optional). Attributes of the same name are populated by the parser when the information is available; absent values stay None.

Field Type Meaning
record_index int \| None 1-based position of the record in the input stream.
record_control_number str \| None Value of the 001 control field for the record being parsed. None for errors raised before 001 is decoded (invalid leader, invalid directory, pre-001 truncation).
field_tag str \| None Tag of the field being parsed (e.g., "245").
indicator_position int \| None Indicator position (0 or 1), populated for InvalidIndicator.
subfield_code int \| None Offending subfield code byte, populated for BadSubfieldCode.
found bytes \| None The bad bytes that triggered the error, capped at 32 bytes.
expected str \| None Human-readable description of what was expected.
byte_offset int \| None Absolute byte offset within the input stream.
record_byte_offset int \| None Byte offset within the current record.
source str \| None Filename or stream identifier, populated when the reader was constructed via from_path.
bytes_near bytes \| None Up to 32 bytes around the error offset, for hex-dump rendering. None when the parser did not have access to a buffer at error time.
bytes_near_offset int \| None Absolute stream offset of the first byte of bytes_near.

Subclass-specific extras:

  • InvalidField, EncodingError, XmlError, JsonError, WriterError add a message: str | None field carrying a human-readable description of the problem.
  • TruncatedRecord adds expected_length and actual_length (both int | None) describing how far short the record was of its declared length.

Always-present vs may-be-present per variant

The parser populates record_index and byte_offset on every parse-path error; record_control_number whenever 001 is already decoded; source whenever the reader was constructed via with_source() or from_path(). Other fields are populated when applicable to the variant (e.g., indicator_position only on InvalidIndicator).

FieldNotFound is an accessor error rather than a parse error; it carries field_tag, record_control_number, and record_index but not byte offsets.

Position semantics by format

byte_offset and record_byte_offset mean different things depending on the input format:

  • ISO 2709 (binary MARC). byte_offset is the absolute byte position in the input stream; record_byte_offset is relative to the start of the current record. This is the primary case.
  • MARCXML. The underlying quick_xml parser does not expose a byte position from its deserializer error type, so byte_offset is None. Position information is available via the wrapped cause: walk err.__cause__ for the original quick_xml error.
  • MARCJSON. The wrapped serde_json::Error exposes line and column; byte_offset is None because translating (line, column) to a byte offset requires the original input bytes. Walk err.__cause__ to read cause.line and cause.column.

When a format's underlying parser does not expose usable position information, the field stays None rather than being fabricated.

Source filename plumbing

The source attribute on errors is populated when the reader was told its input identity. There are two ways to set it:

# 1. Builder method: any reader, any input source.
reader = mrrc.MARCReader(file_obj).with_source("harvest.mrc")

# 2. Convenience constructor: opens a file and sets source from the path.
reader = mrrc.MARCReader.from_path("harvest.mrc")

When neither is used (e.g., reading from BytesIO), source stays None on emitted errors.

The same with_source / from_path pattern is available on AuthorityMARCReader and HoldingsMARCReader.

Validation level vs recovery mode

Two orthogonal axes govern parsing behavior:

  • validation_levelwhat counts as an error.
  • recovery_modewhat to do when one fires.

The single rule, statable in one sentence: structural is lossy across every reader; strict_marc is strict across every reader — every reader behaves the same way at each level.

Concretely:

validation_level="structural" (default) validation_level="strict_marc"
ISO 2709 structural errors (E001–E007, E101, E106) fire fire
Indicator byte validation (E201, byte-level) skipped fires
Per-tag MARC 21 indicator semantics (E201, e.g. 245 ind1 ∈ {0,1}) skipped fires
Subfield-code byte validation (E202) skipped fires
MARC 21 leader semantics (E002, e.g. record_status ∈ {a,c,d,n,p}) skipped fires
UTF-8 strictness (E301) lossy decode (U+FFFD substitution) across bibliographic + authority + holdings strict decode raises across all three readers
reader = mrrc.MARCReader(
    file,
    validation_level="structural",   # or "strict_marc"
    recovery_mode="strict",          # or "lenient", "permissive"
)

The two axes compose. (strict_marc, lenient) means I want byte-level checks AND I want to keep iterating past one bad record — strict_marc makes E201/E202/E301 fire, lenient absorbs them via the per-stream recovery cap.

Recovery modes and errors

The RecoveryMode setting (Strict / Lenient / Permissive) controls whether a malformed record raises immediately, is salvaged with partial data, or is skipped. The structured positional metadata is populated identically in all three modes — the modes only differ in whether the error is propagated, suppressed, or used to inform a salvage attempt.

Defaults: Python permissive, Rust Strict

The Python user surface (mrrc.MARCReader, mrrc.AuthorityMARCReader, mrrc.HoldingsMARCReader) defaults to recovery_mode="permissive" — the same default shape as pymarc / marc4j / libmarc. A fresh MARCReader(file) iterates past per-record defects rather than aborting on the first one, so users coming from those libraries get the expected behavior without setting any kwarg.

The Rust core (mrrc::MarcReader) keeps the stricter RecoveryMode::Strict default. Rust callers expect explicit error handling via Result<T, E> and ? propagation; flipping the default there would convert a loud Err into a quiet record.errors field that the caller has to remember to inspect.

A gentle case for choosing strict when feasible

Permissive mode is the more forgiving default, but it has a real cost worth understanding before you ship it past a prototype:

  • Unsalvageable records yield as None. When the parser can't make even partial sense of a record's bytes, the Python wrapper hands you None rather than skipping silently. A loop written as for record in reader: process(record) will pass None into process unless you guard with if record is not None: or iterate via iter_with_errors(). Worth being deliberate about.
  • Per-record diagnostics live on record.errors. A clean iteration in permissive mode can still be hiding malformed records — the errors are attached to the yielded record rather than raised. If nothing checks record.errors, defects are observable but invisible.
  • record.errors accumulates up to max_errors. Without an explicit max_errors=N kwarg, a pathological stream can fill memory with diagnostic objects before anyone notices. The Rust core caps at DEFAULT_MAX_ERRORS (10 000) per parse, but the Python wrapper- level cap defaults to disabled (see Capping recovered errors with max_errors).

If you control the input and quality matters more than throughput, recovery_mode="strict" makes defects loud: a single bad record raises a typed exception with full positional context. Pair it with permissive=True for the pymarc-shape pattern of "yield None for bad records, stash the exception on current_exception" without losing the precise diagnostics.

# Most forgiving (default): keep going, attach defects to record.errors
reader = mrrc.MARCReader(file)

# Pymarc-shape: yield None for failed parses, stash exception
reader = mrrc.MARCReader(file, permissive=True)

# Loudest: typed exception raised on first defect
reader = mrrc.MARCReader(file, recovery_mode="strict")

Inspecting per-record errors

In lenient and permissive recovery modes, errors that would have been raised under strict are instead attached to the yielded record as record.errors. The list carries one typed exception per recovered defect, with the same positional context (record_index, byte_offset, field_tag, etc.) as if the error had been raised directly.

reader = mrrc.MARCReader(file, recovery_mode="lenient")
for record in reader:
    if record.errors:
        for err in record.errors:
            log.warning(f"[{err.code}] {err}")
    process(record)

In strict mode record.errors is always [] — the parser raises on the first error before the record is yielded. In lenient and permissive it carries diagnostics for every defect the parser recovered from (subject to max_errors cap).

iter_with_errors()

MARCReader.iter_with_errors() is an alternate iterator yielding (record, errors) tuples instead of bare records. Equivalent to iterating + reading record.errors, but more discoverable for the "give-me-everything-defective" use case:

for record, errors in reader.iter_with_errors():
    if errors:
        log.warning(f"{len(errors)} issues parsing record")
    if record:
        process(record)

Under permissive=True, records that the parser cannot salvage at all yield as (None, [exception]) so even unsalvageable records are observable. Without iter_with_errors, those records are silently returned as None and the diagnostic is lost.

reader = mrrc.MARCReader(file, permissive=True)
for record, errors in reader.iter_with_errors():
    if record is None:
        log.error(f"unsalvageable: {errors[0]}")
    else:
        process(record)

AuthorityMARCReader and HoldingsMARCReader expose record.errors the same way (the load-bearing surface). They don't carry the iter_with_errors convenience method — that's a pymarc-shape ergonomic specific to MARCReader. Iterate normally and check record.errors:

for record in mrrc.AuthorityMARCReader(file, recovery_mode="lenient"):
    if record.errors:
        log.warning(...)

Capping recovered errors with max_errors

A pathological stream in lenient / permissive mode can accumulate diagnostics without bound — every malformed record adds one or more MrrcException instances to record.errors. Pass max_errors=N to MARCReader to cap the total recovered count across the stream; once the (N+1)-th recovered error lands, the next iteration raises FatalReaderError (E099) instead of yielding another record.

reader = mrrc.MARCReader(file, recovery_mode="lenient", max_errors=100)
try:
    for record in reader:
        process(record)
except mrrc.FatalReaderError as e:
    log.error(f"stopped after {e.errors_seen} errors (cap={e.cap})")
  • max_errors=None (the default) disables the wrapper-level cap.
  • max_errors=0 also disables the cap (matches the Rust API's no-cap sentinel).
  • max_errors=N for any N > 0 trips on the (N+1)-th recovered error.

Observationally inert in strict mode: the first error raises before any recovery accumulates against the cap. AuthorityMARCReader and HoldingsMARCReader don't carry the kwarg — they inherit the Rust core's per-reader DEFAULT_MAX_ERRORS (10_000) directly.

Structured serialization (to_dict / to_json)

Every MrrcException exposes to_dict() and to_json() for emitting the error into structured logging platforms (ELK, Datadog, Splunk, JSON-line pipelines) without writing an adapter. The Rust side offers a matching MarcError::to_json_value() / to_json() that produces the same schema.

try:
    ...
except mrrc.MrrcException as e:
    log.error(json.dumps({**e.to_dict(), "app": "ingest"}))

Sample output:

>>> err.to_dict()
{
  "schema_version": 1,
  "class": "InvalidIndicator",
  "code": "E201",
  "slug": "invalid_indicator",
  "severity": "error",
  "help_url": "https://dchud.github.io/mrrc/reference/error-codes/#E201",
  "record_index": 847,
  "record_control_number": "ocm01234567",
  "field_tag": "245",
  "indicator_position": 0,
  "found": None,
  "found_hex": "3a",
  "expected": "digit or space",
  "byte_offset": 7217,
  "record_byte_offset": 42,
  "source": "harvest.mrc",
  "bytes_near": None,
  "bytes_near_hex": "323032336e79752020202020202020203a3030203020656e6720641e323435",
  "bytes_near_offset": 7201,
  "_cause": None
}

Notes on the shape

  • Bytes fields carry their data under a _hex suffix key (found_hex, bytes_near_hex); the bare key (found, bytes_near) stays null so the dict is JSON-serializable without a custom encoder. The _hex keys appear only when bytes were captured.
  • _cause is always a string or null, never nested. For the full exception chain pass include_traceback=True or walk __cause__.
  • The emitted bytes are bounded at capture time (found ≤ 32 bytes, bytes_near ≤ 32 bytes from the 16+16 hex-dump window), so payloads don't grow unboundedly.
  • schema_version: 1 is included so callers can branch on it later if the shape ever changes. Pre-1.0, the shape may still evolve.

include_traceback

to_dict(include_traceback=True) adds a traceback key with formatted traceback lines (only present when the exception was actually raised). to_json(include_traceback=True) forwards the flag to to_dict.

Hex dump in detailed()

When the parser captures a byte window around the error offset, the exception's detailed() output appends a 32-byte hex + ASCII dump with a caret pointing at the offending byte:

InvalidIndicator at record 847, field 245
  source:          harvest.mrc
  001:             ocm01234567
  indicator 0:     found b':', expected digit or space
  byte offset:     0x1C31 (7217) in stream
  record-relative: byte 42

bytes near offset 0x1C31:
    0x1C21:  32 30 32 33 6e 79 75 20  20 20 20 20 20 20 20 20 |2023nyu         |
    0x1C31:  3a 30 00 30 20 30 20 65  6e 67 20 64 1e 32 34 35 |:0.0 0 eng d.245|
             ^^ offending byte

The window is up to 16 bytes before + 16 bytes after the error offset, clamped at buffer boundaries. Non-printable bytes render as . in the ASCII sidecar. The window layout is fixed at 16 bytes per row with an 8-byte gap for readability; the format is byte-for-byte identical in Rust (MarcError::detailed()) and Python (MrrcException.detailed()).

The bytes_near attribute on the exception is None when the parser did not have access to a buffer at the point the error was raised (e.g., for wrapping variants like IoError / XmlError / JsonError, or for error paths that do not have buffer access at error time).

Pickle round-trip

Exception instances round-trip through pickle with all positional attributes preserved (subclass extras like expected_length/message included). For security, __setstate__ whitelists incoming attribute names against the per-class allowed set; a maliciously-crafted pickle that tries to set arbitrary attributes (including method names) will raise TypeError rather than silently shadowing methods on the instance.

This is a defense-in-depth measure only. As with any pickle-based deserialization, do not unpickle data from untrusted sources — the unpickling step itself is the relevant attack surface.