Skip to content

Error codes

Every error raised by mrrc carries a stable identifier (Exxx) and a human-friendly slug. Match on the code rather than the exception class name to keep handlers stable across enum restructures, and follow the help URL to land here.

import mrrc

try:
    list(mrrc.MARCReader.from_path("harvest.mrc"))
except mrrc.MrrcException as e:
    print(e.code, e.slug, e.help_url())
    # E201 invalid_indicator https://mrrc.dev/reference/error-codes/#E201
match err {
    e if e.code() == "E201" => handle_indicator_error(e),
    _ => return Err(e),
}

Configuring the help URL base

By default err.help_url() returns a URL anchored to this page hosted on GitHub Pages (https://dchud.github.io/mrrc/reference/error-codes/). Enterprise deployments that mirror the docs internally can redirect the help URL by setting the MRRC_DOCS_BASE_URL environment variable to their docs root. Both the Rust core and the Python bindings honor it:

export MRRC_DOCS_BASE_URL="https://docs.example.com/mrrc"
# err.help_url() → "https://docs.example.com/mrrc/reference/error-codes/#E201"

The variable holds the docs site root; the /reference/error-codes/#Exxx path is appended automatically. Trailing slashes are stripped.

Stability

Two rules, non-negotiable:

  1. Codes never get re-purposed. A retired check leaves its docs entry in place pointing to a replacement.
  2. Codes never get renumbered. URLs that users paste into chat have to keep resolving.

See CONTRIBUTING.md for the full policy.

Code ranges

Range Phase
E0xx Stream / leader
E1xx Directory / field header
E2xx Subfield / indicator
E3xx Encoding
E4xx Serialization / writer
Wxxx Warnings (pymarc parity)

Each range reserves ~80 slots for future growth.


Stream / leader (E0xx)

E001 — record_length_invalid

The leader's record-length field (bytes 0–4) is invalid: not five ASCII digits, or claims a length below the 24-byte minimum.

Context: Parse-side. Applies to: Bibliographic, Authority, Holdings readers. Populates: record_index, byte_offset. May also populate: source.

Common causes. Truncated download; reader fed text instead of binary; attempt to parse a non-MARC file by accident.

How to recover. Verify the input is binary MARC (file usually has a .mrc extension and starts with five ASCII digits). No recovery mode salvages this — the next 24 bytes can't be trusted as a leader.

Python class: mrrc.RecordLengthInvalid.

E002 — leader_invalid

The 24-byte leader is malformed in a way other than the record-length or base-address fields. Two failure shapes share this code:

  1. Structural (always fires): byte-level malformation — e.g., the indicator-count byte (position 10) is not an ASCII digit, or the reserved bytes 20–23 are not 4500. Detected during Leader::from_bytes regardless of validation_level.
  2. MARC 21 semantics (fires only at validation_level="strict_marc"): the leader parses cleanly but a position carries a value not in the MARC 21 allowed set — e.g., record_status (position 5) outside {a, c, d, n, p}, record_type (position 6) outside the documented set, indicator_count not equal to 2, encoding_level outside the allowed set, etc.

The MARC 21 allowed-value sets at the per-position level differ by reader type. The bibliographic, authority, and holdings readers each apply their own format's spec at strict_marc:

  • Bibliographic (MarcReader): the union of positions defined in the MARC 21 Bibliographic Format leader — positions 5–11 and 17–19.
  • Authority (AuthorityMarcReader): the MARC 21 Authority Format allowed sets. Positions 7 (bibliographic_level), 8 (control_record_type), and 19 (multipart_level) are undefined for authority records and accept any byte (including the MARC 21 fill character |). Position 5 accepts the wider set {a, c, d, n, o, s, x}, position 6 must be z, position 17 must be n or o, and position 18 carries the punctuation policy allowed set {space, c, i, u} instead of the bibliographic "cataloging form" set.
  • Holdings (HoldingsMarcReader): the MARC 21 Holdings Format allowed sets. Positions 7, 8, and 19 are undefined as for authority. Position 5 accepts {c, d, n}, position 6 accepts {u, v, x, y}, position 17 (encoding level) accepts {1, 2, 3, 4, 5, m, u, z}, and position 18 (item information in record) accepts {space, i, n}.

A leader that is valid per one record type's format may be invalid per another's (e.g., encoding_level='1' is valid for bibliographic but invalid for authority).

Context: Parse-side. Applies to: Bibliographic, Authority, Holdings readers. Populates: record_index, byte_offset, record_byte_offset (= 0). May also populate: source, found, expected.

Common causes. Records hand-crafted in a text editor and saved in the wrong encoding; output from non-conformant exporters; pre-2000s records using deprecated leader values.

How to recover. Structural shape (1) fires before any field parsing and is not affected by recovery_mode. The MARC 21 semantic shape (2) respects recovery_mode: in lenient/permissive the violation is attached to the yielded record's record.errors list and parsing continues; in strict it raises immediately. To suppress (2) entirely, use the default validation_level="structural".

Python class: mrrc.RecordLeaderInvalid.

E003 — base_address_invalid

The leader's base-address-of-data field (bytes 12–16) is not five ASCII digits or claims a value below 25.

Context: Parse-side. Applies to: Bibliographic, Authority, Holdings readers. Populates: record_index, byte_offset. May also populate: source, record_control_number, found, expected.

Common causes. Records written by older systems that miscalculate the directory length; corrupted bytes 12–16 from in-flight data damage.

How to recover. Not recoverable; the directory boundary can't be inferred without the base address.

Python class: mrrc.BaseAddressInvalid.

E004 — base_address_not_found

The leader claims a base address of data that exceeds the available bytes in the input stream.

Context: Parse-side. Applies to: Bibliographic, Authority, Holdings readers. Populates: record_index, byte_offset. May also populate: source, record_control_number.

Common causes. Truncated input; record header damaged so the length/base-address pair are inconsistent.

How to recover. See E005 for the related truncation case.

Python class: mrrc.BaseAddressNotFound.

E005 — truncated_record

The reader hit EOF before reading the number of bytes the leader claims the record should contain.

Context: Parse-side. Applies to: Bibliographic, Authority, Holdings readers. Populates: record_index, byte_offset, record_byte_offset, expected_length, actual_length. May also populate: source, record_control_number.

Common causes. Network read truncated by connection drop; partially- written file from a crashed exporter; deliberate fuzzing.

How to recover. recovery_mode="lenient" salvages whatever fields parsed cleanly before the truncation point. recovery_mode="strict" raises this error immediately.

Python class: mrrc.TruncatedRecord (subclass of mrrc.EndOfRecordNotFound).

E006 — end_of_record_not_found

The end-of-record byte (0x1D) was not found at the position the leader implied.

Context: Parse-side. Applies to: Bibliographic, Authority, Holdings readers. Populates: record_index, byte_offset, record_byte_offset. May also populate: source, record_control_number.

Common causes. Concatenated records where one was truncated mid- stream; corrupted bytes near the end of a record; encoder bugs.

How to recover. recovery_mode="lenient" accepts the partial record and continues reading from the next leader.

Python class: mrrc.EndOfRecordNotFound. Also catches E005 (TruncatedRecord is a subclass).

E007 — io_error

An I/O error occurred reading from the underlying source.

Context: Parse-side (or anywhere I/O can fail). Applies to: All readers. Populates: cause (the underlying std::io::Error). May also populate: record_index, byte_offset, source.

Common causes. File permissions; broken pipe; network read failure; disk error.

How to recover. Inspect e.__cause__ for the underlying I/O kind. Non-recoverable in general; the caller decides whether to retry.

Python class: raised as Python's built-in OSError (via IOError) rather than a typed mrrc class — matches pymarc behavior. Catch OSError to handle alongside other I/O errors.

E099 — fatal_reader_error

A fatal condition halted the reader. Currently raised when the per-stream recovered-error cap is exceeded in RecoveryMode::Lenient or Permissive — see MarcReader::with_max_errors. The code is reserved for future fatal-reader scenarios as well.

Context: Parse-side, lenient/permissive only. Strict mode aborts on the first error, so this code never fires there. Applies to: MarcReader, AuthorityMarcReader (the holdings reader exposes the builder for parity but has no active recovery sites today). Populates: cap (the configured limit) and errors_seen (count at the moment of the trip). May also populate: record_index, source.

Common causes. Feeding a pathological stream (mostly-malformed records) through a lenient/permissive reader. Without the cap the accumulated per-record diagnostics could exhaust memory.

How to recover. If the cap is a false positive for the input, raise it (reader.with_max_errors(n) with a larger n, or 0 to disable). If it reflects the actual state of the input, investigate the source — large counts of recovered errors usually indicate upstream corruption.

After this error is raised the reader is exhausted; subsequent read_record() calls return Ok(None).


Directory / field header (E1xx)

E101 — directory_invalid

A directory entry (12 bytes: 3-byte tag + 4-byte length + 5-byte start position) is structurally invalid: bad tag bytes, non-numeric length or start, or claimed field bytes extending past the data area.

Context: Parse-side. Applies to: Bibliographic, Authority, Holdings readers. Populates: record_index, byte_offset, record_byte_offset. May also populate: field_tag (when the bad entry's tag was decodable), record_control_number, source, found, expected.

Common causes. Encoder bugs; corrupted bytes; legacy records with non-standard tag formats.

How to recover. recovery_mode="lenient" skips the bad entry and continues parsing the rest of the directory.

Python class: mrrc.RecordDirectoryInvalid. Also catches E106, E201, E202 (subclasses).

E105 — field_not_found

A requested field was not present in the parsed record. This is an accessor error, not a parse error — it surfaces when code calls e.g. record.get_field("245") and the record doesn't contain that tag.

Context: Accessor (post-parse). Applies to: All record types. Populates: field_tag. May also populate: record_control_number, record_index. Never populates: byte_offset (not a parse error).

Common causes. Calling a get_field on records that don't have the tag; programming error or assumption about input shape.

How to recover. Use try/except or check field in record first.

Python class: mrrc.FieldNotFound.

E106 — invalid_field

A data field is structurally invalid in a way not covered by the more specific E201 / E202 subclasses (e.g., field bytes too short for indicators, field declared length exceeds available bytes).

Context: Parse-side. Applies to: Bibliographic, Authority, Holdings readers. Populates: record_index, byte_offset, record_byte_offset, field_tag. May also populate: record_control_number, source. The message attribute carries a human-readable description of the problem.

Common causes. Encoder dropped subfields; declared field length inconsistent with actual data.

How to recover. recovery_mode="lenient" skips the bad field and continues with the rest.

Python class: mrrc.InvalidField (subclass of mrrc.RecordDirectoryInvalid).


Subfield / indicator (E2xx)

E201 — invalid_indicator

A variable-data field's indicator byte is not a valid value for the given tag. Two failure shapes are reported under this code at validation_level="strict_marc":

  1. Byte-level: the indicator byte is not an ASCII digit (0-9) or space. expected is "ASCII digit (0-9) or space".
  2. Per-tag MARC 21 semantics: the byte passes (1) but violates the per-tag rule for the field. For example, the first indicator of 245 (Title statement) is restricted to 0 or 1 per MARC 21; a 9 is byte-valid but tag-invalid. expected describes the tag-specific rule (e.g., "'0' or '1'", "digit 0-9").

The two shapes share an error code because they share a position (field_tag, indicator_position) and a remedy (fix the indicator); they differ only in the expected string.

Context: Parse-side. Applies to: Bibliographic, Authority, Holdings readers — fired uniformly when validation_level="strict_marc". At the default validation_level="structural" indicator bytes are accepted as-is and this code does not fire. Populates: record_index, byte_offset, record_byte_offset, field_tag, indicator_position, found, expected. May also populate: record_control_number, source.

Common causes. Source systems emitting local-use indicators; records round-tripped through non-conformant ILSes; sloppy cataloging from pre-2000s records.

How to recover. Use validation_level="structural" (default) to accept the bytes silently, or recovery_mode="lenient" to keep iterating past the offending record under strict_marc.

Python class: mrrc.InvalidIndicator (subclass of mrrc.RecordDirectoryInvalid).

E202 — bad_subfield_code

A subfield code byte (immediately following a 0x1F delimiter) is not a printable ASCII character.

Context: Parse-side. Applies to: Bibliographic, Authority, Holdings readers — fired uniformly when validation_level="strict_marc". At the default validation_level="structural" subfield-code bytes are accepted as-is and this code does not fire. Populates: record_index, byte_offset, record_byte_offset, field_tag, subfield_code (the offending byte). May also populate: record_control_number, source.

Common causes. Bytes corrupted near a subfield boundary; encoder emitting non-ASCII codes for local-use subfields.

How to recover. Use validation_level="structural" (default) to accept the bytes silently, or recovery_mode="lenient" to keep iterating past the offending record under strict_marc.

Python class: mrrc.BadSubfieldCode (subclass of mrrc.RecordDirectoryInvalid).


Encoding (E3xx)

E301 — utf8_invalid

A subfield value or control field contains bytes that are not valid UTF-8.

Context: Parse-side (or wherever a string conversion runs). Applies to: Bibliographic, authority, and holdings readers — fired uniformly when validation_level="strict_marc" and a value contains bytes that aren't valid UTF-8. At the default validation_level="structural" all three readers fall back to lossy decoding (U+FFFD substitution) and don't surface this code. Populates: record_index. May also populate: field_tag, byte_offset, source, record_control_number. The message attribute carries the underlying std::str::Utf8Error description.

Common causes. Records cataloged in MARC-8 encoding without correct character-coding leader byte; legacy records with embedded byte sequences that valid in MARC-8 but not in UTF-8.

How to recover. Convert input to UTF-8 before parsing, or set validation_level="structural" if you can tolerate U+FFFD substitutions and don't need byte-perfect fidelity.

Python class: mrrc.EncodingError.


Serialization / writer (E4xx)

E401 — marcxml_invalid

A MARCXML document failed to parse.

Context: Parse-side (XML parser layer). Applies to: mrrc.marcxml_to_record / marcxml_to_records. Populates: cause (the underlying quick_xml error). May also populate: record_index, byte_offset (when the parser exposes a position), source. The message attribute carries the parser's diagnostic.

Common causes. Malformed XML (unclosed tags, invalid characters); namespace-prefix mismatch; non-MARCXML XML where MARCXML was expected.

How to recover. Inspect e.__cause__ for the parser's specific error. The bytes can't be re-parsed without correction.

Python class: mrrc.XmlError.

E402 — marcjson_invalid

A MARCJSON document failed to parse.

Context: Parse-side (JSON parser layer). Applies to: mrrc.marcjson_to_record, mrrc.json_to_record. Populates: cause (the underlying serde_json::Error with line() and column() available). May also populate: record_index, byte_offset, source.

Common causes. Truncated JSON; mixed text encodings; non-MARCJSON JSON where MARCJSON was expected.

How to recover. Inspect e.__cause__.line and .column for the position; re-encode the input or fix upstream.

Python class: mrrc.JsonError.

E404 — record_too_large_for_iso2709

The writer attempted to serialize a record whose total length or base- address-of-data exceeds the ISO 2709 5-digit limit (99999 bytes for length, same for base address).

Context: Writer-side. Applies to: MARCWriter, AuthorityMarcWriter, HoldingsMarcWriter. Populates: record_index, record_control_number. The message attribute names which limit was exceeded with the actual byte count. Never populates: byte_offset (this fires before any bytes are written).

Common causes. Records with very large fields (full-text content in 505 or 520); aggregations of records with many repeated fields.

How to recover. Split the record into smaller units; use a different serialization format (MARCXML or MARCJSON) that doesn't have the 5-digit length limit.

Python class: mrrc.WriterError.


Warnings (Wxxx)

W001 — bad_subfield_code_warning

A subfield code is unusual but the field is otherwise valid (pymarc compatibility — pymarc raises this as a UserWarning).

Context: Warning during parsing; does not abort the parse. Python class: mrrc.BadSubfieldCodeWarning (a UserWarning, not an exception).