Tests Fail on Realistic Bugs

Tests fail on realistic bugs

A test that passes when the code is broken is worse than no test, because it teaches you to trust the suite. Before shipping a new or modified test, run it through five questions. Every answer has to be "yes" or the test isn't earning its place.

The five-question gate

Would this fail on a realistic bug? A boundary flip, an off-by-one, a swapped branch — would the test catch any of them? Tests that only catch "the function was deleted entirely" miss the bugs that actually happen.
Are expected values requirement-derived, not implementation-derived? If the test was written by running the function and pasting the output as the expected value, the test will pass forever — including after the implementation drifts in ways that break the requirement.
Does this cover more than the happy path where relevant? Boundaries, branches, and failure paths are where bugs live. A happy-path-only test is incomplete for anything with branches.
If function internals are refactored but behavior preserved, does this still pass? A test coupled to private implementation details fails on harmless refactors and protects nothing.
If the function body is broken or removed, does this fail for the right reason? The mutation-testing question. If the test passes when the body is unimplemented!(), the test is asserting nothing useful.

A test that survives all five questions is exercising public behavior, derived from the spec, covers the important cases, and has actual teeth.

The banned patterns these questions catch

Tautological pass-through. "Mock returns X; function returns mock's value; assert function returns X." Tests the mock, not the function. Question 5 catches this: replace the function body with return mock_value and the test still passes.
Vacuous assertions. is_ok, is_some, !is_empty, "renders without crashing." Question 1 catches this: a boundary flip would not flip is_ok, so the test wouldn't fail on a real bug.
Snapshot-only critical behavior. A snapshot test with no semantic invariants pinned. Question 2 catches this: the expected value is whatever the implementation produced, not what it should produce.
Implementation-coupled tests. Asserting on private helper internals. Question 4 catches this: rename the helper, test breaks, behavior didn't change.
Redundant equivalence-class tests. Five tests covering the same branch with cosmetic differences. Not a single-test failure mode but a suite-level smell — the redundant tests aren't adding coverage, they're adding maintenance cost.

Why this is process, not preference

Test quality decays without a checkpoint. Every new test gets added under the pressure of "ship the feature"; the easiest test to write is the one that mirrors the implementation you just wrote. The five-question gate is a five-minute checkpoint that catches the easy ones before they land.

Doing it consistently is what separates a suite that catches bugs from a suite that just runs.

Where this generalizes

Any language, any framework. The questions are about what a test asserts and whether the assertion has teeth, not about which framework you use:

The 5-question gate applies whether you're writing pytest, RSpec, Jest, JUnit, or Rust's built-in test harness
The banned patterns show up in every ecosystem under slightly different names (the React equivalent of "vacuous assertion" is toBeTruthy; the Python equivalent of "snapshot-only" is assert result == EXPECTED where EXPECTED is whatever the function returned last time)

The question to ask before merging the test isn't "do I have tests?" It's "do my tests fail when the code is broken?"

Tests fail on realistic bugs

The five-question gate

The banned patterns these questions catch

Why this is process, not preference

Where this generalizes

See also