Formal methods: property tests and fuzzing¶
This page explains the lightweight formal-methods toolkit mrrc uses to catch the kinds of bugs unit tests are bad at finding. It is aimed at readers who have written assert-style tests before but are new to property-based testing.
Scope: lightweight, not full verification
"Formal methods" covers a wide spectrum, from informal specifications through property-based testing all the way to mechanically-checked proofs in tools like TLA+, Coq, or Lean. mrrc operates at the lightweight end of that spectrum. We use property-based testing (proptest) and coverage-guided fuzzing (cargo-fuzz) to falsify a finite set of invariants on a large but bounded sample of inputs. We do not prove the codec correct, run a model checker, or maintain a formal specification. The techniques on this page catch real bugs cheaply, but they are sampling, not proof. The verification pyramid at the end of the page sketches the broader spectrum and points at extensions mrrc could grow into.
The runnable suite lives in tests/properties.rs; the fuzz
infrastructure is documented separately in Fuzzing.
Property-based testing in five minutes¶
A unit test asserts a single concrete fact:
#[test]
fn parses_simple_record() {
let bytes = include_bytes!("simple.mrc");
let record = MarcReader::new(Cursor::new(&bytes[..])).read_record().unwrap().unwrap();
assert_eq!(record.get_control_field("001"), Some("12345"));
}
This catches that one bug. It says nothing about the next record, or the same record with a different leader byte, or a record with subfield values containing tab characters.
A property asserts a fact that should hold for every input drawn from some space:
proptest! {
#[test]
fn binary_roundtrip(record in arb_record()) {
let bytes = serialize(&record);
let parsed = MarcReader::new(Cursor::new(&bytes[..]))
.read_record().unwrap().unwrap();
prop_assert_eq!(record, parsed);
}
}
The framework — proptest in mrrc's case — generates
many concrete record values from arb_record() (a strategy) and
runs the assertion against each one. If any input violates the
property, you get a bug.
The interesting parts:
- Random generation, but structured.
arb_record()doesn't produce arbitrary bytes; it produces structurally validRecordvalues via composable strategies (a leader, then control fields, then data fields, with valid tags and indicator characters). The test then checks invariants you can't get from raw fuzzing. - Shrinking. When proptest finds a failing input it doesn't just hand you the random 200-field record that crashed — it automatically simplifies it, halving subfield counts, dropping fields, replacing strings with shorter ones, until it has the smallest input that still triggers the bug. You typically end up with a 1-field, 1-subfield, 1-character reproducer.
- Counter-example seeds. The shrunk failing input gets saved as a
hex-encoded seed in
tests/proptest-regressions/properties.txtand re-runs on every test invocation forever after. A regression that reintroduces the bug fails immediately on the saved seed, before any new random cases run. See Regression seeds below.
The mental model that helps the most: properties are specifications written in code. "Round-tripping a record through the writer and reader produces an equal record" is a specification of mrrc's binary codec. Proptest tries to falsify it.
Properties mrrc enforces¶
tests/properties.rs runs eight properties on every cargo test
invocation. The full source is the canonical reference; the table
below is a one-line summary of each.
| Property | What it asserts |
|---|---|
binary_roundtrip |
A record serialized to ISO 2709 and parsed back compares equal to the original — leader fields, control fields, data fields, indicators, subfield codes, and subfield values all preserved. |
serialization_never_panics |
MarcWriter::write_record returns Ok and emits non-empty output for every generated record. |
leader_length_matches_emitted_bytes |
The record_length field the writer puts in the leader equals the total byte count of the serialized record. |
directory_entries_tile_data_area |
Directory entries start immediately after one another with no gaps or overlaps; field lengths sum (plus one for the record terminator) to the data-area size; every entry ends in FIELD_TERMINATOR; the record ends in RECORD_TERMINATOR. |
indicator_bytes_in_valid_set |
Every indicator byte in every emitted data field is either a digit (0–9) or an ASCII space. |
subfield_codes_are_lower_alnum |
Every byte immediately following a SUBFIELD_DELIMITER (0x1F) is a lowercase ASCII letter or digit. |
marcxml_roundtrip |
A record serialized to MARCXML and parsed back compares equal — including subfield values containing XML metacharacters (< > & " ') and arbitrary whitespace. |
marcjson_roundtrip |
A record serialized to MARCJSON and parsed back compares equal — including subfield values containing JSON-special characters (\t \n \r \\ "). |
The four structural-invariant properties (leader_length_matches_emitted_bytes,
directory_entries_tile_data_area, indicator_bytes_in_valid_set,
subfield_codes_are_lower_alnum) inspect the emitted ISO 2709 bytes
directly rather than checking round-trip equality. They guard against
a writer bug that would silently produce malformed-but-self-consistent
output: a writer that drops a field could pass binary_roundtrip if
the corresponding parser also dropped it, but the directory-tiling
property would fail because the emitted bytes wouldn't match the
declared structure.
The two non-binary round-trip properties (marcxml_roundtrip,
marcjson_roundtrip) deliberately exercise format-specific escaping,
which is where text-format codecs typically have bugs — XML entity
references and CDATA sections, JSON backslash escapes, namespace
handling, whitespace treatment.
Configuration and runtime¶
ProptestConfig::cases = 64 keeps the full suite under ten seconds
locally (about one second on an Apple-silicon laptop). Override for a
deeper one-off run:
The CI matrix runs the suite at the default 64 cases on every PR.
Saved regression seeds run unconditionally regardless of cases.
Regression seeds¶
When proptest finds a failing input it writes the shrunk seed to
tests/proptest-regressions/properties.txt. The format is
hex-encoded RNG state plus a comment explaining what the seed
shrinks to:
# cc <hex-encoded shrunk input> # <one-line explanation>
cc c55bf86fe34e98890f3851b08dc2838087458f63372964533b0190c7d491c89e # whitespace-only control value — MARCXML reader must preserve whitespace rather than error with "missing field `$value`"
The policy is straightforward:
- Commit accepted seeds. They become permanent guards. A future refactor that reintroduces the bug fails on the seed before any random case is generated.
- Annotate each seed. A bare hex string ages into mystery; one short comment about what shape the input takes saves a future reader (or future you) from re-shrinking when triaging.
- Don't commit
.pendingfiles. Proptest writes those during shrinking; they are gitignored.
Acceptance happens in the same PR as the fix: the regression test must fail on the seed before the fix, then pass after.
How fuzzing complements property tests¶
Coverage-guided fuzzing — see Fuzzing for the full infrastructure — points at the same problem from the opposite end:
| Dimension | Property tests | Coverage-guided fuzzing |
|---|---|---|
| Input generation | Structured strategies (arb_record() returns valid Record) |
Mutated bytes guided by code coverage |
| What it asserts | Invariants you write (round-trip, tiling, character sets) | The harness must not panic / OOM / hang |
| Where bugs surface | Logic bugs in the codec layer | Reader-side input-validation bugs in raw byte streams |
| Where it runs | Every PR via cargo test |
Nightly via .github/workflows/fuzz.yml |
| Triage artifact | Seed in tests/proptest-regressions/properties.txt |
Reproducer in tests/data/fuzz-regressions/<target>/ |
In other words: property tests prove things about records you can build; fuzzing finds bugs in the code paths that turn arbitrary bytes into records. Both feed regression tests that run on every PR. Neither replaces the other.
Where this sits in the broader picture¶
The companion mrrc-testbed repo's formal-methods-verification-strategy.md positions the techniques mrrc uses as levels 3 and 4 of a 5-level verification pyramid:
- Type system (Rust's borrow checker — the always-on baseline)
- Unit tests (concrete examples, in
src/**/testsandtests/) - Property tests (
tests/properties.rs— this page) - Coverage-guided fuzzing (Fuzzing)
- Bounded model checking (Kani over small input spaces — future)
Levels 1 and 2 catch common mistakes; levels 3 and 4 catch the long tail; level 5 (not yet adopted in mrrc) would prove specific properties exhaustively for inputs up to a fixed size.
Further reading¶
The most approachable starting points if you want to learn more:
- proptest book — the framework mrrc uses, with a worked introduction to strategies and shrinking.
- Hypothesis docs — Python's property-testing library is the most thoroughly documented in the wider ecosystem; the conceptual material transfers cleanly to proptest.
- QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs (Claessen & Hughes, 2000) — the original paper that introduced property-based testing. Short (~12 pages), readable, and explains the core ideas in their cleanest form.
- In Praise of Property-Based Testing (Hughes, 2020) — a more recent retrospective on what property-based testing caught in industrial use, including the famous Volvo ECU bug shrunk to a 4-line reproducer.