Error Handling¶
mrrc raises a typed exception hierarchy with structured positional metadata
on every error: where in the byte stream the problem occurred, which record
it came from, the 001 control number, the field/subfield being parsed, and
the source filename when known. The class names and parent relationships
match pymarc's exception layer, so code written against pymarc's exception
classes catches the same conditions in mrrc unchanged.
Exception hierarchy¶
Exception
├── MrrcException (base)
│ ├── RecordLengthInvalid
│ ├── RecordLeaderInvalid
│ ├── BaseAddressInvalid
│ ├── BaseAddressNotFound
│ ├── RecordDirectoryInvalid
│ │ ├── InvalidIndicator (mrrc)
│ │ ├── BadSubfieldCode (mrrc)
│ │ └── InvalidField (mrrc)
│ ├── EndOfRecordNotFound
│ │ └── TruncatedRecord (mrrc)
│ ├── FieldNotFound
│ ├── FatalReaderError
│ ├── EncodingError (mrrc)
│ ├── XmlError (mrrc)
│ ├── JsonError (mrrc)
│ ├── WriterError (mrrc)
│ └── StaleFieldError (mrrc)
└── OSError
└── PyIOError (Python built-in, raised on I/O failure)
class BadSubfieldCodeWarning(UserWarning)
Classes marked (mrrc) are mrrc-specific subclasses that pymarc does not
have. Each one extends the closest pymarc parent so existing
pymarc-style except clauses keep catching the same conditions.
Choosing what to catch¶
| You want to… | Catch |
|---|---|
| Match pymarc's catch behavior exactly | The pymarc-named class (RecordDirectoryInvalid, EndOfRecordNotFound, etc.) — mrrc-specific subclasses are caught too. |
| Distinguish indicator errors from subfield errors | InvalidIndicator and BadSubfieldCode separately. |
| Catch every mrrc error, no matter the variant | MrrcException. |
| Catch only I/O errors | OSError (or its IOError alias). |
| Handle a field handle invalidated by removals | StaleFieldError — re-fetch the field from the record and retry. Raised by live field handles (see Field handles) after any remove_field/remove_fields call; it is a usage error, not a data error, so it carries no E-code. |
Pymarc exception compatibility¶
This page covers exception class names, hierarchy, and catch behavior
only. The new positional attributes are additive: pymarc-style code that
inspects only str(err) keeps working without change.
Other compatibility surfaces — record APIs, reader/writer constructor shapes, format coverage, and performance characteristics — are out of scope for this page; consult the Python API reference and Rust API reference for those.
Exception name mapping¶
| pymarc class | mrrc class | Notes |
|---|---|---|
PymarcException |
MrrcException |
Same role; alias if desired (see below). |
RecordLengthInvalid |
RecordLengthInvalid |
Same name; gains positional attrs. |
RecordLeaderInvalid |
RecordLeaderInvalid |
Same name; gains positional attrs. |
BaseAddressInvalid |
BaseAddressInvalid |
Same name; gains positional attrs. |
BaseAddressNotFound |
BaseAddressNotFound |
Same name; gains positional attrs. |
RecordDirectoryInvalid |
RecordDirectoryInvalid |
Same name; gains positional attrs. Also catches new mrrc subclasses InvalidIndicator, BadSubfieldCode, InvalidField. |
EndOfRecordNotFound |
EndOfRecordNotFound |
Same name; gains positional attrs. Also catches new subclass TruncatedRecord. |
FieldNotFound |
FieldNotFound |
Same name; gains record_control_number, record_index. |
FatalReaderError |
FatalReaderError |
Same name; reserved for catastrophic states. |
BadSubfieldCodeWarning |
BadSubfieldCodeWarning |
Same name (UserWarning, not exception). |
IOError / OSError |
OSError (via PyIOError) |
I/O errors map to Python's built-in. |
Pymarc names mrrc deliberately omits¶
The following pymarc classes are intentionally absent in mrrc. Each row gives the rationale and the mrrc-equivalent behavior a port should rely on instead.
| pymarc class | why mrrc doesn't have it | mrrc-equivalent behavior |
|---|---|---|
NoFieldsFound |
An empty Record is a valid in-memory state in mrrc; no exception is raised. |
Check record.get_fields() length. |
WriteNeedsRecord |
MARCWriter.write_record is type-annotated; passing a non-Record is a static-type error. |
Static type check (pyright / mypy). |
NoActiveFile |
MARCWriter is context-managed; operating on a closed writer raises plain RuntimeError. |
Use a with block or check writer state. |
BadLeaderValue |
mrrc.Leader validates fields at construction. |
Bad values raise ValueError. |
MissingLinkedFields |
880-linkage validation isn't part of the parser. | Validate links in caller code. |
Optional symbol-level aliases¶
For projects swapping pymarc imports to mrrc and wanting
from pymarc import RecordLeaderInvalid-style imports to keep working:
from mrrc import MrrcException as PymarcException
from mrrc import (
RecordLengthInvalid,
RecordLeaderInvalid,
BaseAddressInvalid,
BaseAddressNotFound,
RecordDirectoryInvalid,
EndOfRecordNotFound,
FieldNotFound,
FatalReaderError,
BadSubfieldCodeWarning,
)
The catch hierarchy behaves the same as in pymarc. Code outside the exception layer (record manipulation, reader/writer APIs, format I/O) may still need changes; consult the Python API reference.
What you gain on the exception layer¶
Three patterns, in order of effort:
Same except, more context. Existing pymarc-style code keeps working.
The same except clause now also gets structured attributes:
try:
for record in mrrc.MARCReader(open("harvest.mrc", "rb")):
...
except mrrc.RecordDirectoryInvalid as e:
log.warning(
"directory error in record %d (001=%s, field %s) at byte 0x%X",
e.record_index, e.record_control_number, e.field_tag, e.byte_offset,
)
Opt-in granularity. mrrc-aware code can catch the new subclasses directly to make decisions on the specific error kind:
try:
...
except mrrc.InvalidIndicator as e:
log.warning(
"Bad indicator at field %s ind%d in record %d",
e.field_tag, e.indicator_position, e.record_index,
)
except mrrc.BadSubfieldCode as e:
log.warning("Bad subfield code 0x%02X at field %s", e.subfield_code, e.field_tag)
Diagnostic dump. The detailed() method produces a multi-line
diagnostic suitable for logs:
InvalidIndicator at record 847, field 245
source: harvest.mrc
001: ocm01234567
indicator 1: found b':', expected digit or space
byte offset: 0x1C31 (7217) in stream
record-relative: byte 42
Subclass behavior reference¶
If you except this class… |
…you also catch these mrrc-specific subclasses |
|---|---|
RecordDirectoryInvalid |
InvalidIndicator, BadSubfieldCode, InvalidField |
EndOfRecordNotFound |
TruncatedRecord |
MrrcException |
All mrrc-specific exceptions |
OSError |
PyIOError (I/O failures) |
MARCReader.current_exception / current_chunk¶
mrrc's MARCReader exposes pymarc-compatible current_exception and
current_chunk attributes. After each __next__ step:
reader.current_chunkholds the raw bytes of the record just read from the source (declared length per the leader). Set on every successful chunk read regardless of whether the parse step then succeeded or failed.reader.current_exceptionholds the typedMrrcExceptionswallowed bypermissive=True, orNoneon a clean read.
reader = mrrc.MARCReader("harvest.mrc", permissive=True)
for record in reader:
if record is None:
log.warning(
"skipped malformed record (%d bytes): %s",
len(reader.current_chunk) if reader.current_chunk else 0,
reader.current_exception,
)
continue
process(record)
Two documented divergences from pymarc:
- Encoding strictness. mrrc raises
EncodingErroron invalid UTF-8 in subfield values (swallowed viacurrent_exceptionunderpermissive=True); pymarc applies lossy substitution silently. The iteration shape is identical (the bad record yields asNoneeither way), so callers usingexcept Exception:keep working. current_chunkon byte-read errors. When the underlying read of the next record's bytes fails before parsing begins (truncated stream, I/O error),current_chunkmay beNoneeven thoughcurrent_exceptionis set. For parse failures of fully-read chunks (the common case),current_chunkcarries the full record bytes as pymarc does.
Known hierarchy divergences from pymarc¶
mrrc's exception class names match pymarc's, but two relationships in
the class tree differ. Existing except clauses written against a
specific class name (except RecordDirectoryInvalid:,
except EndOfRecordNotFound:, etc.) work in mrrc unchanged. The
divergences only matter for code that catches a parent class.
FatalReaderError parentage. In pymarc, FatalReaderError is the
parent of RecordLengthInvalid, TruncatedRecord, and
EndOfRecordNotFound; a pymarc loop can except FatalReaderError: to
catch any of those four. In mrrc, FatalReaderError is a sibling
(reserved for the specific "recovered-error cap exceeded" case under
recovery_mode="lenient"/"permissive" with with_max_errors).
except FatalReaderError: in mrrc therefore catches only the
cap-exhausted case, not the malformed-record cases. To match pymarc's
catch surface, either enumerate the four classes —
— or catch the mrrc base, which is broader (every typed mrrc error):
PymarcException → MrrcException. The base class name differs.
from pymarc import PymarcException fails at import; replace with
from mrrc import MrrcException (or alias on import — see Optional
symbol-level aliases below).
Per-variant field reference¶
Each exception class accepts the following keyword arguments at construction
time (all optional). Attributes of the same name are populated by the parser
when the information is available; absent values stay None.
| Field | Type | Meaning |
|---|---|---|
record_index |
int \| None |
1-based position of the record in the input stream. |
record_control_number |
str \| None |
Value of the 001 control field for the record being parsed. None for errors raised before 001 is decoded (invalid leader, invalid directory, pre-001 truncation). |
field_tag |
str \| None |
Tag of the field being parsed (e.g., "245"). |
indicator_position |
int \| None |
Indicator position (0 or 1), populated for InvalidIndicator. |
subfield_code |
int \| None |
Offending subfield code byte, populated for BadSubfieldCode. |
found |
bytes \| None |
The bad bytes that triggered the error, capped at 32 bytes. |
expected |
str \| None |
Human-readable description of what was expected. |
byte_offset |
int \| None |
Absolute byte offset within the input stream. |
record_byte_offset |
int \| None |
Byte offset within the current record. |
source |
str \| None |
Filename or stream identifier, populated when the reader was constructed via from_path. |
bytes_near |
bytes \| None |
Up to 32 bytes around the error offset, for hex-dump rendering. None when the parser did not have access to a buffer at error time. |
bytes_near_offset |
int \| None |
Absolute stream offset of the first byte of bytes_near. |
Subclass-specific extras:
InvalidField,EncodingError,XmlError,JsonError,WriterErroradd amessage: str | Nonefield carrying a human-readable description of the problem.TruncatedRecordaddsexpected_lengthandactual_length(bothint | None) describing how far short the record was of its declared length.
Always-present vs may-be-present per variant¶
The parser populates record_index and byte_offset on every parse-path
error; record_control_number whenever 001 is already decoded;
source whenever the reader was constructed via with_source() or
from_path(). Other fields are populated when applicable to the variant
(e.g., indicator_position only on InvalidIndicator).
FieldNotFound is an accessor error rather than a parse error; it carries
field_tag, record_control_number, and record_index but not byte
offsets.
Position semantics by format¶
byte_offset and record_byte_offset mean different things depending on the
input format:
- ISO 2709 (binary MARC).
byte_offsetis the absolute byte position in the input stream;record_byte_offsetis relative to the start of the current record. This is the primary case. - MARCXML. The underlying
quick_xmlparser does not expose a byte position from its deserializer error type, sobyte_offsetisNone. Position information is available via the wrapped cause: walkerr.__cause__for the originalquick_xmlerror. - MARCJSON. The wrapped
serde_json::Errorexposes line and column;byte_offsetisNonebecause translating (line, column) to a byte offset requires the original input bytes. Walkerr.__cause__to readcause.lineandcause.column.
When a format's underlying parser does not expose usable position
information, the field stays None rather than being fabricated.
Source filename plumbing¶
The source attribute on errors is populated when the reader was told its
input identity. There are two ways to set it:
# 1. Builder method: any reader, any input source.
reader = mrrc.MARCReader(file_obj).with_source("harvest.mrc")
# 2. Convenience constructor: opens a file and sets source from the path.
reader = mrrc.MARCReader.from_path("harvest.mrc")
When neither is used (e.g., reading from BytesIO), source stays None
on emitted errors.
The same with_source / from_path pattern is available on
AuthorityMARCReader and HoldingsMARCReader.
Validation level vs recovery mode¶
Two orthogonal axes govern parsing behavior:
validation_level— what counts as an error.recovery_mode— what to do when one fires.
The single rule, statable in one sentence: structural is lossy
across every reader; strict_marc is strict across every reader — every
reader behaves the same way at each level.
Concretely:
validation_level="structural" (default) |
validation_level="strict_marc" |
|
|---|---|---|
| ISO 2709 structural errors (E001–E007, E101, E106) | fire | fire |
| Indicator byte validation (E201, byte-level) | skipped | fires |
| Per-tag MARC 21 indicator semantics (E201, e.g. 245 ind1 ∈ {0,1}) | skipped | fires |
| Subfield-code byte validation (E202) | skipped | fires |
| MARC 21 leader semantics (E002, e.g. record_status ∈ {a,c,d,n,p}) | skipped | fires |
| UTF-8 strictness (E301) | lossy decode (U+FFFD substitution) across bibliographic + authority + holdings |
strict decode raises across all three readers |
reader = mrrc.MARCReader(
file,
validation_level="structural", # or "strict_marc"
recovery_mode="strict", # or "lenient", "permissive"
)
The two axes compose. (strict_marc, lenient) means I want byte-level
checks AND I want to keep iterating past one bad record — strict_marc
makes E201/E202/E301 fire, lenient absorbs them via the per-stream
recovery cap.
Recovery modes and errors¶
The RecoveryMode setting (Strict / Lenient / Permissive) controls
whether a malformed record raises immediately, is salvaged with partial
data, or is skipped. The structured positional metadata is populated
identically in all three modes — the modes only differ in whether the
error is propagated, suppressed, or used to inform a salvage attempt.
Defaults: Python permissive, Rust Strict¶
The Python user surface (mrrc.MARCReader, mrrc.AuthorityMARCReader,
mrrc.HoldingsMARCReader) defaults to recovery_mode="permissive" —
the same default shape as pymarc / marc4j / libmarc. A fresh
MARCReader(file) iterates past per-record defects rather than aborting
on the first one, so users coming from those libraries get the
expected behavior without setting any kwarg.
The Rust core (mrrc::MarcReader) keeps the stricter RecoveryMode::Strict
default. Rust callers expect explicit error handling via Result<T, E>
and ? propagation; flipping the default there would convert a loud
Err into a quiet record.errors field that the caller has to
remember to inspect.
A gentle case for choosing strict when feasible¶
Permissive mode is the more forgiving default, but it has a real cost worth understanding before you ship it past a prototype:
- Unsalvageable records yield as
None. When the parser can't make even partial sense of a record's bytes, the Python wrapper hands youNonerather than skipping silently. A loop written asfor record in reader: process(record)will passNoneintoprocessunless you guard withif record is not None:or iterate viaiter_with_errors(). Worth being deliberate about. - Per-record diagnostics live on
record.errors. A clean iteration in permissive mode can still be hiding malformed records — the errors are attached to the yielded record rather than raised. If nothing checksrecord.errors, defects are observable but invisible. record.errorsaccumulates up tomax_errors. Without an explicitmax_errors=Nkwarg, a pathological stream can fill memory with diagnostic objects before anyone notices. The Rust core caps atDEFAULT_MAX_ERRORS(10 000) per parse, but the Python wrapper- level cap defaults to disabled (see Capping recovered errors withmax_errors).
If you control the input and quality matters more than throughput,
recovery_mode="strict" makes defects loud: a single bad record
raises a typed exception with full positional context. Pair it with
permissive=True for the pymarc-shape pattern of "yield None for
bad records, stash the exception on current_exception" without
losing the precise diagnostics.
# Most forgiving (default): keep going, attach defects to record.errors
reader = mrrc.MARCReader(file)
# Pymarc-shape: yield None for failed parses, stash exception
reader = mrrc.MARCReader(file, permissive=True)
# Loudest: typed exception raised on first defect
reader = mrrc.MARCReader(file, recovery_mode="strict")
Inspecting per-record errors¶
In lenient and permissive recovery modes, errors that would have
been raised under strict are instead attached to the yielded
record as record.errors. The list carries one typed exception per
recovered defect, with the same positional context (record_index,
byte_offset, field_tag, etc.) as if the error had been raised directly.
reader = mrrc.MARCReader(file, recovery_mode="lenient")
for record in reader:
if record.errors:
for err in record.errors:
log.warning(f"[{err.code}] {err}")
process(record)
In strict mode record.errors is always [] — the parser raises on
the first error before the record is yielded. In lenient and
permissive it carries diagnostics for every defect the parser
recovered from (subject to max_errors cap).
iter_with_errors()¶
MARCReader.iter_with_errors() is an alternate iterator yielding
(record, errors) tuples instead of bare records. Equivalent to
iterating + reading record.errors, but more discoverable for the
"give-me-everything-defective" use case:
for record, errors in reader.iter_with_errors():
if errors:
log.warning(f"{len(errors)} issues parsing record")
if record:
process(record)
Under permissive=True, records that the parser cannot salvage at all
yield as (None, [exception]) so even unsalvageable records are
observable. Without iter_with_errors, those records are silently
returned as None and the diagnostic is lost.
reader = mrrc.MARCReader(file, permissive=True)
for record, errors in reader.iter_with_errors():
if record is None:
log.error(f"unsalvageable: {errors[0]}")
else:
process(record)
AuthorityMARCReader and HoldingsMARCReader expose record.errors
the same way (the load-bearing surface). They don't carry the
iter_with_errors convenience method — that's a pymarc-shape ergonomic
specific to MARCReader. Iterate normally and check record.errors:
for record in mrrc.AuthorityMARCReader(file, recovery_mode="lenient"):
if record.errors:
log.warning(...)
Capping recovered errors with max_errors¶
A pathological stream in lenient / permissive mode can accumulate diagnostics without bound — every malformed record adds one or more MrrcException instances to record.errors. Pass max_errors=N to MARCReader to cap the total recovered count across the stream; once the (N+1)-th recovered error lands, the next iteration raises FatalReaderError (E099) instead of yielding another record.
reader = mrrc.MARCReader(file, recovery_mode="lenient", max_errors=100)
try:
for record in reader:
process(record)
except mrrc.FatalReaderError as e:
log.error(f"stopped after {e.errors_seen} errors (cap={e.cap})")
max_errors=None(the default) disables the wrapper-level cap.max_errors=0also disables the cap (matches the Rust API's no-cap sentinel).max_errors=Nfor anyN > 0trips on the (N+1)-th recovered error.
Observationally inert in strict mode: the first error raises before any recovery accumulates against the cap. AuthorityMARCReader and HoldingsMARCReader don't carry the kwarg — they inherit the Rust core's per-reader DEFAULT_MAX_ERRORS (10_000) directly.
Structured serialization (to_dict / to_json)¶
Every MrrcException exposes to_dict() and to_json() for emitting the
error into structured logging platforms (ELK, Datadog, Splunk,
JSON-line pipelines) without writing an adapter. The Rust side offers a
matching MarcError::to_json_value() / to_json() that produces the same
schema.
Sample output:
>>> err.to_dict()
{
"schema_version": 1,
"class": "InvalidIndicator",
"code": "E201",
"slug": "invalid_indicator",
"severity": "error",
"help_url": "https://dchud.github.io/mrrc/reference/error-codes/#E201",
"record_index": 847,
"record_control_number": "ocm01234567",
"field_tag": "245",
"indicator_position": 0,
"found": None,
"found_hex": "3a",
"expected": "digit or space",
"byte_offset": 7217,
"record_byte_offset": 42,
"source": "harvest.mrc",
"bytes_near": None,
"bytes_near_hex": "323032336e79752020202020202020203a3030203020656e6720641e323435",
"bytes_near_offset": 7201,
"_cause": None
}
Notes on the shape¶
- Bytes fields carry their data under a
_hexsuffix key (found_hex,bytes_near_hex); the bare key (found,bytes_near) staysnullso the dict is JSON-serializable without a custom encoder. The_hexkeys appear only when bytes were captured. _causeis always a string ornull, never nested. For the full exception chain passinclude_traceback=Trueor walk__cause__.- The emitted bytes are bounded at capture time (
found≤ 32 bytes,bytes_near≤ 32 bytes from the 16+16 hex-dump window), so payloads don't grow unboundedly. schema_version: 1is included so callers can branch on it later if the shape ever changes. Pre-1.0, the shape may still evolve.
include_traceback¶
to_dict(include_traceback=True) adds a traceback key with formatted
traceback lines (only present when the exception was actually raised).
to_json(include_traceback=True) forwards the flag to to_dict.
Hex dump in detailed()¶
When the parser captures a byte window around the error offset, the
exception's detailed() output appends a 32-byte hex + ASCII dump with a
caret pointing at the offending byte:
InvalidIndicator at record 847, field 245
source: harvest.mrc
001: ocm01234567
indicator 0: found b':', expected digit or space
byte offset: 0x1C31 (7217) in stream
record-relative: byte 42
bytes near offset 0x1C31:
0x1C21: 32 30 32 33 6e 79 75 20 20 20 20 20 20 20 20 20 |2023nyu |
0x1C31: 3a 30 00 30 20 30 20 65 6e 67 20 64 1e 32 34 35 |:0.0 0 eng d.245|
^^ offending byte
The window is up to 16 bytes before + 16 bytes after the error offset,
clamped at buffer boundaries. Non-printable bytes render as . in the
ASCII sidecar. The window layout is fixed at 16 bytes per row with an
8-byte gap for readability; the format is byte-for-byte identical in
Rust (MarcError::detailed()) and Python (MrrcException.detailed()).
The bytes_near attribute on the exception is None when the parser
did not have access to a buffer at the point the error was raised
(e.g., for wrapping variants like IoError / XmlError / JsonError,
or for error paths that do not have buffer access at error time).
Pickle round-trip¶
Exception instances round-trip through pickle with all positional
attributes preserved (subclass extras like expected_length/message
included). For security, __setstate__ whitelists incoming attribute names
against the per-class allowed set; a maliciously-crafted pickle that tries
to set arbitrary attributes (including method names) will raise TypeError
rather than silently shadowing methods on the instance.
This is a defense-in-depth measure only. As with any pickle-based deserialization, do not unpickle data from untrusted sources — the unpickling step itself is the relevant attack surface.