Materialize Only When Forced
Materialize only when forced
Defer materialized caches and denormalized tables until real query latency forces them. Document the path; don't build the path. The cost of premature materialization is real: schema migrations, freshness bugs, sync gaps between source and cache, code paths that the original query no longer matches.
The rule
Compute on demand first. If the query is bounded — touches a small set of rows, uses indexes that already exist, returns quickly on real data — leave it on demand. Add a materialized cache when a benchmark on a representative dataset shows the on-demand query is slow enough to bother a user.
Don't add caches for hypothetical scale. Don't add caches because "we might need them later." A documented-but-deferred cache is cheaper than a built-but-stale one.
What "documented" means
When you decide not to materialize, write down what the cache would look like if you did. Table name, columns, refresh strategy, invalidation trigger. The note serves two purposes:
- The next contributor doesn't reinvent the analysis
- When latency finally forces the decision, the path forward is short — you're implementing the documented design, not designing it under pressure
Mxr's AI email layer does this explicitly. recipient_reply_latency_buckets, expertise_index, collaborator_patterns, entities, entity_mentions — all documented as future materialization options. None implemented. The trigger for implementing each is a real, measured query slow on a real mailbox.
Why this is restraint, not laziness
Materialized caches are infrastructure. Once they exist:
- Every write path that touches the source must also touch the cache
- Stale caches produce wrong answers without throwing errors
- Migrations get more complex because there are two schemas to evolve in lockstep
- The query layer has to know about both the source and the cache
That cost is acceptable when query latency makes it acceptable. Before that, it's just complexity hunting a justification.
The complement: when you should materialize
The signal is concrete:
- A real user-visible operation takes too long on real data
- The slow path is the same query every time (caching helps), not a different query each time
- The source data changes predictably enough that invalidation has a clean trigger
When that's true, materialize. Pick the structure you documented. Move quickly.
Where this generalizes
This is "premature optimization is the root of all evil" applied specifically to data layout. It applies anywhere:
- Read replicas (don't add one until the primary is contended)
- Search indexes (don't add one until grep is too slow)
- Denormalized views (don't add one until the join is hot)
- Computed columns (don't add one until the function is in every hot query)
The reverse is harder. Removing materialized infrastructure once it exists is painful because something somewhere depends on it. Better to delay the addition than reverse it later.
See also
- Deterministic Before LLM — same restraint applied to a different layer: don't add LLM calls speculatively
- Mxr — concrete instance:
docs/reference/ai-email.mddocuments which caches to add and why none have been added yet