Error codes¶
Every error raised by mrrc carries a stable identifier (Exxx) and a
human-friendly slug. Match on the code rather than the exception class
name to keep handlers stable across enum restructures, and follow the
help URL to land here.
import mrrc
try:
list(mrrc.MARCReader.from_path("harvest.mrc"))
except mrrc.MrrcException as e:
print(e.code, e.slug, e.help_url())
# E201 invalid_indicator https://mrrc.dev/reference/error-codes/#E201
Configuring the help URL base¶
By default err.help_url() returns a URL anchored to this page hosted on
GitHub Pages (https://dchud.github.io/mrrc/reference/error-codes/).
Enterprise deployments that mirror the docs internally can redirect the
help URL by setting the MRRC_DOCS_BASE_URL environment variable to
their docs root. Both the Rust core and the Python bindings honor it:
export MRRC_DOCS_BASE_URL="https://docs.example.com/mrrc"
# err.help_url() → "https://docs.example.com/mrrc/reference/error-codes/#E201"
The variable holds the docs site root; the /reference/error-codes/#Exxx
path is appended automatically. Trailing slashes are stripped.
Stability¶
Two rules, non-negotiable:
- Codes never get re-purposed. A retired check leaves its docs entry in place pointing to a replacement.
- Codes never get renumbered. URLs that users paste into chat have to keep resolving.
See CONTRIBUTING.md for the full policy.
Code ranges¶
| Range | Phase |
|---|---|
E0xx |
Stream / leader |
E1xx |
Directory / field header |
E2xx |
Subfield / indicator |
E3xx |
Encoding |
E4xx |
Serialization / writer |
Wxxx |
Warnings (pymarc parity) |
Each range reserves ~80 slots for future growth.
Stream / leader (E0xx)¶
E001 — record_length_invalid¶
The leader's record-length field (bytes 0–4) is invalid: not five ASCII digits, or claims a length below the 24-byte minimum.
Context: Parse-side.
Applies to: Bibliographic, Authority, Holdings readers.
Populates: record_index, byte_offset. May also populate: source.
Common causes. Truncated download; reader fed text instead of binary; attempt to parse a non-MARC file by accident.
How to recover. Verify the input is binary MARC (file usually has a
.mrc extension and starts with five ASCII digits). No recovery mode
salvages this — the next 24 bytes can't be trusted as a leader.
Python class: mrrc.RecordLengthInvalid.
E002 — leader_invalid¶
The 24-byte leader is malformed in a way other than the record-length or base-address fields. Two failure shapes share this code:
- Structural (always fires): byte-level malformation — e.g., the
indicator-count byte (position 10) is not an ASCII digit, or the
reserved bytes 20–23 are not
4500. Detected duringLeader::from_bytesregardless ofvalidation_level. - MARC 21 semantics (fires only at
validation_level="strict_marc"): the leader parses cleanly but a position carries a value not in the MARC 21 allowed set — e.g.,record_status(position 5) outside{a, c, d, n, p},record_type(position 6) outside the documented set,indicator_countnot equal to 2,encoding_leveloutside the allowed set, etc.
The MARC 21 allowed-value sets at the per-position level differ by
reader type. The bibliographic, authority, and holdings readers each
apply their own format's spec at strict_marc:
- Bibliographic (
MarcReader): the union of positions defined in the MARC 21 Bibliographic Format leader — positions 5–11 and 17–19. - Authority (
AuthorityMarcReader): the MARC 21 Authority Format allowed sets. Positions 7 (bibliographic_level), 8 (control_record_type), and 19 (multipart_level) are undefined for authority records and accept any byte (including the MARC 21 fill character|). Position 5 accepts the wider set{a, c, d, n, o, s, x}, position 6 must bez, position 17 must benoro, and position 18 carries the punctuation policy allowed set{space, c, i, u}instead of the bibliographic "cataloging form" set. - Holdings (
HoldingsMarcReader): the MARC 21 Holdings Format allowed sets. Positions 7, 8, and 19 are undefined as for authority. Position 5 accepts{c, d, n}, position 6 accepts{u, v, x, y}, position 17 (encoding level) accepts{1, 2, 3, 4, 5, m, u, z}, and position 18 (item information in record) accepts{space, i, n}.
A leader that is valid per one record type's format may be invalid
per another's (e.g., encoding_level='1' is valid for bibliographic
but invalid for authority).
Context: Parse-side.
Applies to: Bibliographic, Authority, Holdings readers.
Populates: record_index, byte_offset, record_byte_offset (= 0).
May also populate: source, found, expected.
Common causes. Records hand-crafted in a text editor and saved in the wrong encoding; output from non-conformant exporters; pre-2000s records using deprecated leader values.
How to recover. Structural shape (1) fires before any field parsing
and is not affected by recovery_mode. The MARC 21 semantic shape (2)
respects recovery_mode: in lenient/permissive the violation is
attached to the yielded record's record.errors list and parsing
continues; in strict it raises immediately. To suppress (2) entirely,
use the default validation_level="structural".
Python class: mrrc.RecordLeaderInvalid.
E003 — base_address_invalid¶
The leader's base-address-of-data field (bytes 12–16) is not five ASCII digits or claims a value below 25.
Context: Parse-side.
Applies to: Bibliographic, Authority, Holdings readers.
Populates: record_index, byte_offset. May also populate: source,
record_control_number, found, expected.
Common causes. Records written by older systems that miscalculate the directory length; corrupted bytes 12–16 from in-flight data damage.
How to recover. Not recoverable; the directory boundary can't be inferred without the base address.
Python class: mrrc.BaseAddressInvalid.
E004 — base_address_not_found¶
The leader claims a base address of data that exceeds the available bytes in the input stream.
Context: Parse-side.
Applies to: Bibliographic, Authority, Holdings readers.
Populates: record_index, byte_offset. May also populate: source,
record_control_number.
Common causes. Truncated input; record header damaged so the length/base-address pair are inconsistent.
How to recover. See E005 for the related truncation case.
Python class: mrrc.BaseAddressNotFound.
E005 — truncated_record¶
The reader hit EOF before reading the number of bytes the leader claims the record should contain.
Context: Parse-side.
Applies to: Bibliographic, Authority, Holdings readers.
Populates: record_index, byte_offset, record_byte_offset,
expected_length, actual_length. May also populate: source,
record_control_number.
Common causes. Network read truncated by connection drop; partially- written file from a crashed exporter; deliberate fuzzing.
How to recover. recovery_mode="lenient" salvages whatever fields
parsed cleanly before the truncation point. recovery_mode="strict"
raises this error immediately.
Python class: mrrc.TruncatedRecord (subclass of
mrrc.EndOfRecordNotFound).
E006 — end_of_record_not_found¶
The end-of-record byte (0x1D) was not found at the position the leader
implied.
Context: Parse-side.
Applies to: Bibliographic, Authority, Holdings readers.
Populates: record_index, byte_offset, record_byte_offset. May
also populate: source, record_control_number.
Common causes. Concatenated records where one was truncated mid- stream; corrupted bytes near the end of a record; encoder bugs.
How to recover. recovery_mode="lenient" accepts the partial record
and continues reading from the next leader.
Python class: mrrc.EndOfRecordNotFound. Also catches
E005 (TruncatedRecord is a subclass).
E007 — io_error¶
An I/O error occurred reading from the underlying source.
Context: Parse-side (or anywhere I/O can fail).
Applies to: All readers.
Populates: cause (the underlying std::io::Error). May also
populate: record_index, byte_offset, source.
Common causes. File permissions; broken pipe; network read failure; disk error.
How to recover. Inspect e.__cause__ for the underlying I/O kind.
Non-recoverable in general; the caller decides whether to retry.
Python class: raised as Python's built-in OSError (via IOError)
rather than a typed mrrc class — matches pymarc behavior. Catch OSError
to handle alongside other I/O errors.
E099 — fatal_reader_error¶
A fatal condition halted the reader. Currently raised when the
per-stream recovered-error cap is exceeded in RecoveryMode::Lenient
or Permissive — see MarcReader::with_max_errors.
The code is reserved for future fatal-reader scenarios as well.
Context: Parse-side, lenient/permissive only. Strict mode aborts
on the first error, so this code never fires there.
Applies to: MarcReader, AuthorityMarcReader (the holdings
reader exposes the builder for parity but has no active recovery
sites today).
Populates: cap (the configured limit) and errors_seen (count
at the moment of the trip). May also populate: record_index,
source.
Common causes. Feeding a pathological stream (mostly-malformed records) through a lenient/permissive reader. Without the cap the accumulated per-record diagnostics could exhaust memory.
How to recover. If the cap is a false positive for the input,
raise it (reader.with_max_errors(n) with a larger n, or 0 to
disable). If it reflects the actual state of the input, investigate
the source — large counts of recovered errors usually indicate
upstream corruption.
After this error is raised the reader is exhausted; subsequent
read_record() calls return Ok(None).
Directory / field header (E1xx)¶
E101 — directory_invalid¶
A directory entry (12 bytes: 3-byte tag + 4-byte length + 5-byte start position) is structurally invalid: bad tag bytes, non-numeric length or start, or claimed field bytes extending past the data area.
Context: Parse-side.
Applies to: Bibliographic, Authority, Holdings readers.
Populates: record_index, byte_offset, record_byte_offset. May
also populate: field_tag (when the bad entry's tag was decodable),
record_control_number, source, found, expected.
Common causes. Encoder bugs; corrupted bytes; legacy records with non-standard tag formats.
How to recover. recovery_mode="lenient" skips the bad entry and
continues parsing the rest of the directory.
Python class: mrrc.RecordDirectoryInvalid. Also catches
E106, E201, E202 (subclasses).
E105 — field_not_found¶
A requested field was not present in the parsed record. This is an
accessor error, not a parse error — it surfaces when code calls e.g.
record.get_field("245") and the record doesn't contain that tag.
Context: Accessor (post-parse).
Applies to: All record types.
Populates: field_tag. May also populate: record_control_number,
record_index. Never populates: byte_offset (not a parse error).
Common causes. Calling a get_field on records that don't have the
tag; programming error or assumption about input shape.
How to recover. Use try/except or check field in record first.
Python class: mrrc.FieldNotFound.
E106 — invalid_field¶
A data field is structurally invalid in a way not covered by the more specific E201 / E202 subclasses (e.g., field bytes too short for indicators, field declared length exceeds available bytes).
Context: Parse-side.
Applies to: Bibliographic, Authority, Holdings readers.
Populates: record_index, byte_offset, record_byte_offset,
field_tag. May also populate: record_control_number, source. The
message attribute carries a human-readable description of the problem.
Common causes. Encoder dropped subfields; declared field length inconsistent with actual data.
How to recover. recovery_mode="lenient" skips the bad field and
continues with the rest.
Python class: mrrc.InvalidField (subclass of
mrrc.RecordDirectoryInvalid).
Subfield / indicator (E2xx)¶
E201 — invalid_indicator¶
A variable-data field's indicator byte is not a valid value for the given
tag. Two failure shapes are reported under this code at
validation_level="strict_marc":
- Byte-level: the indicator byte is not an ASCII digit (
0-9) or space.expectedis"ASCII digit (0-9) or space". - Per-tag MARC 21 semantics: the byte passes (1) but violates the
per-tag rule for the field. For example, the first indicator of
245(Title statement) is restricted to0or1per MARC 21; a9is byte-valid but tag-invalid.expecteddescribes the tag-specific rule (e.g.,"'0' or '1'","digit 0-9").
The two shapes share an error code because they share a position
(field_tag, indicator_position) and a remedy (fix the indicator); they
differ only in the expected string.
Context: Parse-side.
Applies to: Bibliographic, Authority, Holdings readers — fired
uniformly when validation_level="strict_marc". At the default
validation_level="structural" indicator bytes are accepted as-is and
this code does not fire.
Populates: record_index, byte_offset, record_byte_offset,
field_tag, indicator_position, found, expected. May also populate:
record_control_number, source.
Common causes. Source systems emitting local-use indicators; records round-tripped through non-conformant ILSes; sloppy cataloging from pre-2000s records.
How to recover. Use validation_level="structural" (default) to
accept the bytes silently, or recovery_mode="lenient" to keep iterating
past the offending record under strict_marc.
Python class: mrrc.InvalidIndicator (subclass of
mrrc.RecordDirectoryInvalid).
E202 — bad_subfield_code¶
A subfield code byte (immediately following a 0x1F delimiter) is not a
printable ASCII character.
Context: Parse-side.
Applies to: Bibliographic, Authority, Holdings readers — fired
uniformly when validation_level="strict_marc". At the default
validation_level="structural" subfield-code bytes are accepted as-is
and this code does not fire.
Populates: record_index, byte_offset, record_byte_offset,
field_tag, subfield_code (the offending byte). May also populate:
record_control_number, source.
Common causes. Bytes corrupted near a subfield boundary; encoder emitting non-ASCII codes for local-use subfields.
How to recover. Use validation_level="structural" (default) to
accept the bytes silently, or recovery_mode="lenient" to keep iterating
past the offending record under strict_marc.
Python class: mrrc.BadSubfieldCode (subclass of
mrrc.RecordDirectoryInvalid).
Encoding (E3xx)¶
E301 — utf8_invalid¶
A subfield value or control field contains bytes that are not valid UTF-8.
Context: Parse-side (or wherever a string conversion runs).
Applies to: Bibliographic, authority, and holdings readers — fired
uniformly when validation_level="strict_marc" and a value contains
bytes that aren't valid UTF-8. At the default validation_level="structural"
all three readers fall back to lossy decoding (U+FFFD substitution)
and don't surface this code.
Populates: record_index. May also populate: field_tag,
byte_offset, source, record_control_number. The message attribute
carries the underlying std::str::Utf8Error description.
Common causes. Records cataloged in MARC-8 encoding without correct character-coding leader byte; legacy records with embedded byte sequences that valid in MARC-8 but not in UTF-8.
How to recover. Convert input to UTF-8 before parsing, or set
validation_level="structural" if you can tolerate U+FFFD substitutions
and don't need byte-perfect fidelity.
Python class: mrrc.EncodingError.
Serialization / writer (E4xx)¶
E401 — marcxml_invalid¶
A MARCXML document failed to parse.
Context: Parse-side (XML parser layer).
Applies to: mrrc.marcxml_to_record / marcxml_to_records.
Populates: cause (the underlying quick_xml error). May also
populate: record_index, byte_offset (when the parser exposes a
position), source. The message attribute carries the parser's
diagnostic.
Common causes. Malformed XML (unclosed tags, invalid characters); namespace-prefix mismatch; non-MARCXML XML where MARCXML was expected.
How to recover. Inspect e.__cause__ for the parser's specific
error. The bytes can't be re-parsed without correction.
Python class: mrrc.XmlError.
E402 — marcjson_invalid¶
A MARCJSON document failed to parse.
Context: Parse-side (JSON parser layer).
Applies to: mrrc.marcjson_to_record, mrrc.json_to_record.
Populates: cause (the underlying serde_json::Error with line()
and column() available). May also populate: record_index,
byte_offset, source.
Common causes. Truncated JSON; mixed text encodings; non-MARCJSON JSON where MARCJSON was expected.
How to recover. Inspect e.__cause__.line and .column for the
position; re-encode the input or fix upstream.
Python class: mrrc.JsonError.
E404 — record_too_large_for_iso2709¶
The writer attempted to serialize a record whose total length or base- address-of-data exceeds the ISO 2709 5-digit limit (99999 bytes for length, same for base address).
Context: Writer-side.
Applies to: MARCWriter, AuthorityMarcWriter, HoldingsMarcWriter.
Populates: record_index, record_control_number. The message
attribute names which limit was exceeded with the actual byte count.
Never populates: byte_offset (this fires before any bytes are written).
Common causes. Records with very large fields (full-text content in 505 or 520); aggregations of records with many repeated fields.
How to recover. Split the record into smaller units; use a different serialization format (MARCXML or MARCJSON) that doesn't have the 5-digit length limit.
Python class: mrrc.WriterError.
Warnings (Wxxx)¶
W001 — bad_subfield_code_warning¶
A subfield code is unusual but the field is otherwise valid (pymarc
compatibility — pymarc raises this as a UserWarning).
Context: Warning during parsing; does not abort the parse.
Python class: mrrc.BadSubfieldCodeWarning (a UserWarning, not an
exception).