CORRECTNESS
Built to be trusted, not just tested.
A database earns trust by construction. EzraDB's consistency-critical core is formally model-checked, exercised by deterministic simulation, and fuzzed for query correctness — with every failure reproducible from a seed. Here is exactly how, and exactly where the line is.
Formally model-checked
TLA+ and Stateright models of Raft replication, leader-lease, MVCC/SSI serializability, parallel-commit, online-DDL.
Deterministic simulation
FoundationDB-style DST covering all five Raft safety properties, with seed-reproducible replay.
Metamorphic fuzzing
SQLancer-style TLP and NoREC query-correctness fuzzing.
Mutation-gated
CI-gated mutation testing on correctness-critical crates.
FORMAL METHODS
Five core protocols, model-checked in TLA+ and Stateright.
The protocols where a subtle race means data loss are specified as executable models and checked before a line of implementation lands. Model-checking explores the state space for the invariant violations that unit tests never think to write.
Raft replication
Log agreement and leader election across replicas.
Leader-lease safety
No two leaseholders serve a range at the same time.
MVCC / SSI serializability
Snapshot isolation and serializable schedules hold.
Parallel-commit
Atomic multi-range commit — no torn or partial transactions.
Online-DDL safety
Schema changes stay safe under concurrent traffic.
Spec PR before code
Protocol changes are spec-first: the model and its invariants ship in a spec PR and pass the checker before implementation is written. The spec is the source of truth, not an afterthought.
DETERMINISTIC SIMULATION
FoundationDB-style simulation, replayable from a seed.
The cluster runs against a simulated world — clock, disk, and network under our control — so we can inject partitions, reorderings, and crashes on a single thread. Every run is deterministic: a failing seed replays the exact schedule, bit for bit, until the bug is fixed.
All five Raft safety properties, continuously checked
At most one leader per term.
A leader never overwrites or deletes its own log.
Identical entries imply identical prefixes.
Committed entries survive in every future leader.
No two nodes apply different commands at an index.
seed → identical schedule → identical failure. Reproduce, then fix.
QUERY-CORRECTNESS FUZZING
SQLancer-style metamorphic fuzzing, run nightly.
Metamorphic testing finds wrong answers with no oracle: transform a query in a result-preserving way and any divergence is a real bug. Paired with multi-layer boundary fuzzing, it hunts the logic errors that crash-only fuzzers miss.
TLP
Ternary Logic Partitioning splits a predicate into TRUE / FALSE / NULL branches whose union must equal the unpartitioned result.
NoREC
Non-optimizing Reference Engine Construction compares an optimized plan against a naive rewrite — the row counts must agree.
Boundary fuzzing
Multi-layer fuzzing at the wire, parser, and type boundaries, plus PostgreSQL-isolationtester-style permutation tests for SI / SSI conformance.
CI GATES
Tests that prove the tests, and engines that must agree.
Mutation testing asks a harder question than coverage: if we deliberately break the code, does a test notice? And because the same SQL runs on two very different storage engines, we gate on them producing identical answers.
Mutation testing
CI-gated mutation testing on the correctness-critical crates — the optimizer, runtime filters, and hash join. Injected faults that survive the suite fail the gate.
Row-vs-column parity gate
Every TPC-H query runs on both the row and column engines and must return byte-identical results. One engine, one set of answers.
WHERE THE LINE IS
What these methods do — and what they don't.
- This is bounded model-checking over small state spaces — strong evidence of correctness, not a machine-checked proof. We say model-checked, never “proven.”
- A Jepsen harness is in progress; consistency today is covered by deterministic simulation and formal model-checking, not a Jepsen result.
Pre-1.0, and honest about it. Formally model-checked core, reproducible benchmarks, a public roadmap.