CORRECTNESS

Built to be trusted, not just tested.

A database earns trust by construction. EzraDB's consistency-critical core is formally model-checked, exercised by deterministic simulation, and fuzzed for query correctness — with every failure reproducible from a seed. Here is exactly how, and exactly where the line is.

Formally model-checked coreDeterministic simulationSeed-reproducible replay

Formally model-checked

TLA+ and Stateright models of Raft replication, leader-lease, MVCC/SSI serializability, parallel-commit, online-DDL.

Deterministic simulation

FoundationDB-style DST covering all five Raft safety properties, with seed-reproducible replay.

Metamorphic fuzzing

SQLancer-style TLP and NoREC query-correctness fuzzing.

Mutation-gated

CI-gated mutation testing on correctness-critical crates.

FORMAL METHODS

Five core protocols, model-checked in TLA+ and Stateright.

The protocols where a subtle race means data loss are specified as executable models and checked before a line of implementation lands. Model-checking explores the state space for the invariant violations that unit tests never think to write.

Raft replication

Log agreement and leader election across replicas.

Leader-lease safety

No two leaseholders serve a range at the same time.

MVCC / SSI serializability

Snapshot isolation and serializable schedules hold.

Parallel-commit

Atomic multi-range commit — no torn or partial transactions.

Online-DDL safety

Schema changes stay safe under concurrent traffic.

Spec PR before code

Protocol changes are spec-first: the model and its invariants ship in a spec PR and pass the checker before implementation is written. The spec is the source of truth, not an afterthought.

DETERMINISTIC SIMULATION

FoundationDB-style simulation, replayable from a seed.

The cluster runs against a simulated world — clock, disk, and network under our control — so we can inject partitions, reorderings, and crashes on a single thread. Every run is deterministic: a failing seed replays the exact schedule, bit for bit, until the bug is fixed.

All five Raft safety properties, continuously checked

Election Safety

At most one leader per term.

Leader Append-Only

A leader never overwrites or deletes its own log.

Log Matching

Identical entries imply identical prefixes.

Leader Completeness

Committed entries survive in every future leader.

State Machine Safety

No two nodes apply different commands at an index.

seed → identical schedule → identical failure. Reproduce, then fix.

QUERY-CORRECTNESS FUZZING

SQLancer-style metamorphic fuzzing, run nightly.

Metamorphic testing finds wrong answers with no oracle: transform a query in a result-preserving way and any divergence is a real bug. Paired with multi-layer boundary fuzzing, it hunts the logic errors that crash-only fuzzers miss.

TLP

Ternary Logic Partitioning splits a predicate into TRUE / FALSE / NULL branches whose union must equal the unpartitioned result.

NoREC

Non-optimizing Reference Engine Construction compares an optimized plan against a naive rewrite — the row counts must agree.

Boundary fuzzing

Multi-layer fuzzing at the wire, parser, and type boundaries, plus PostgreSQL-isolationtester-style permutation tests for SI / SSI conformance.

CI GATES

Tests that prove the tests, and engines that must agree.

Mutation testing asks a harder question than coverage: if we deliberately break the code, does a test notice? And because the same SQL runs on two very different storage engines, we gate on them producing identical answers.

Mutation testing

CI-gated mutation testing on the correctness-critical crates — the optimizer, runtime filters, and hash join. Injected faults that survive the suite fail the gate.

Row-vs-column parity gate

Every TPC-H query runs on both the row and column engines and must return byte-identical results. One engine, one set of answers.

22 / 22TPC-H queries, byte-identical across engines

0Tolerated divergences

WHERE THE LINE IS

What these methods do — and what they don't.

This is bounded model-checking over small state spaces — strong evidence of correctness, not a machine-checked proof. We say model-checked, never “proven.”
A Jepsen harness is in progress; consistency today is covered by deterministic simulation and formal model-checking, not a Jepsen result.

Pre-1.0, and honest about it. Formally model-checked core, reproducible benchmarks, a public roadmap.