Tests Fail on Realistic Bugs

Tests fail on realistic bugs

A test that passes when the code is broken is worse than no test, because it teaches you to trust the suite. Before shipping a new or modified test, run it through five questions. Every answer has to be "yes" or the test isn't earning its place.

The five-question gate

  1. Would this fail on a realistic bug? A boundary flip, an off-by-one, a swapped branch — would the test catch any of them? Tests that only catch "the function was deleted entirely" miss the bugs that actually happen.
  2. Are expected values requirement-derived, not implementation-derived? If the test was written by running the function and pasting the output as the expected value, the test will pass forever — including after the implementation drifts in ways that break the requirement.
  3. Does this cover more than the happy path where relevant? Boundaries, branches, and failure paths are where bugs live. A happy-path-only test is incomplete for anything with branches.
  4. If function internals are refactored but behavior preserved, does this still pass? A test coupled to private implementation details fails on harmless refactors and protects nothing.
  5. If the function body is broken or removed, does this fail for the right reason? The mutation-testing question. If the test passes when the body is unimplemented!(), the test is asserting nothing useful.

A test that survives all five questions is exercising public behavior, derived from the spec, covers the important cases, and has actual teeth.

The banned patterns these questions catch

Why this is process, not preference

Test quality decays without a checkpoint. Every new test gets added under the pressure of "ship the feature"; the easiest test to write is the one that mirrors the implementation you just wrote. The five-question gate is a five-minute checkpoint that catches the easy ones before they land.

Doing it consistently is what separates a suite that catches bugs from a suite that just runs.

Where this generalizes

Any language, any framework. The questions are about what a test asserts and whether the assertion has teeth, not about which framework you use:

The question to ask before merging the test isn't "do I have tests?" It's "do my tests fail when the code is broken?"

See also