Tests Fail on Realistic Bugs
Tests fail on realistic bugs
A test that passes when the code is broken is worse than no test, because it teaches you to trust the suite. Before shipping a new or modified test, run it through five questions. Every answer has to be "yes" or the test isn't earning its place.
The five-question gate
- Would this fail on a realistic bug? A boundary flip, an off-by-one, a swapped branch — would the test catch any of them? Tests that only catch "the function was deleted entirely" miss the bugs that actually happen.
- Are expected values requirement-derived, not implementation-derived? If the test was written by running the function and pasting the output as the expected value, the test will pass forever — including after the implementation drifts in ways that break the requirement.
- Does this cover more than the happy path where relevant? Boundaries, branches, and failure paths are where bugs live. A happy-path-only test is incomplete for anything with branches.
- If function internals are refactored but behavior preserved, does this still pass? A test coupled to private implementation details fails on harmless refactors and protects nothing.
- If the function body is broken or removed, does this fail for the right reason? The mutation-testing question. If the test passes when the body is
unimplemented!(), the test is asserting nothing useful.
A test that survives all five questions is exercising public behavior, derived from the spec, covers the important cases, and has actual teeth.
The banned patterns these questions catch
- Tautological pass-through. "Mock returns X; function returns mock's value; assert function returns X." Tests the mock, not the function. Question 5 catches this: replace the function body with
return mock_valueand the test still passes. - Vacuous assertions.
is_ok,is_some,!is_empty, "renders without crashing." Question 1 catches this: a boundary flip would not flipis_ok, so the test wouldn't fail on a real bug. - Snapshot-only critical behavior. A snapshot test with no semantic invariants pinned. Question 2 catches this: the expected value is whatever the implementation produced, not what it should produce.
- Implementation-coupled tests. Asserting on private helper internals. Question 4 catches this: rename the helper, test breaks, behavior didn't change.
- Redundant equivalence-class tests. Five tests covering the same branch with cosmetic differences. Not a single-test failure mode but a suite-level smell — the redundant tests aren't adding coverage, they're adding maintenance cost.
Why this is process, not preference
Test quality decays without a checkpoint. Every new test gets added under the pressure of "ship the feature"; the easiest test to write is the one that mirrors the implementation you just wrote. The five-question gate is a five-minute checkpoint that catches the easy ones before they land.
Doing it consistently is what separates a suite that catches bugs from a suite that just runs.
Where this generalizes
Any language, any framework. The questions are about what a test asserts and whether the assertion has teeth, not about which framework you use:
- The 5-question gate applies whether you're writing pytest, RSpec, Jest, JUnit, or Rust's built-in test harness
- The banned patterns show up in every ecosystem under slightly different names (the React equivalent of "vacuous assertion" is
toBeTruthy; the Python equivalent of "snapshot-only" isassert result == EXPECTEDwhereEXPECTEDis whatever the function returned last time)
The question to ask before merging the test isn't "do I have tests?" It's "do my tests fail when the code is broken?"
See also
- Mxr — concrete instance:
docs/idiomatic-rust-tests.mdcodifies the gate as the project standard - How I Write Software Docs — adjacent discipline: docs that fail the "delete the code blocks" test are operationally useless the same way tests that fail the 5-question gate are