Deterministic Before LLM

#resources #resources/programming #resources/programming/architecture

Deterministic before LLM

For any feature that uses an LLM, the deterministic path runs first. Regex, joins, address history, stylometry, explicit config — all the cheap, exact, reproducible work. The LLM is what runs when the deterministic path can't answer, not what runs first because it's flashy.

The rule and its corollary

The rule is straightforward: don't add an LLM call for something a join can do. Tone-mismatch detection in mxr stayed deterministic on purpose — stylometry against a contact's prior messages is a histogram, not a synthesis task. If a check could be answered with a query against existing tables, look at the data first.

The corollary is harder: when the LLM is disabled, the deterministic path must still produce a useful answer or an explicit error. Silent degradation that pretends success is a bug. A feature that quietly fails-open when the LLM is unavailable is worse than one that fails-loud, because the user never learns the LLM mattered.

Why this is load-bearing

Three failure modes the rule prevents:

The "demo-driven" feature. An LLM-only check looks great in a demo and falls apart when the local LLM is off, the API key isn't set, or the rate limit kicked in. The deterministic-first version works everywhere; the LLM is the cherry.
The cost-and-latency creep. Every LLM call costs tokens and adds round-trip time. Putting one in a hot path you could've answered with a SQL query is a tax on every user every time.
The unreproducible bug. Deterministic checks fail the same way every time, with stack traces you can read. LLM checks fail differently each run, depending on prompt, model, temperature, and provider. Reproducible bugs are debuggable; LLM bugs require a vibe check.

What this looks like in practice

A pre-send safety pipeline in mxr has six checks. Five are deterministic (typo distance, attachment regex, reply-all sanity, PII detectors, tone stylometry). One — answer-coverage — is LLM-backed because "does this draft actually address the asks in the thread?" is genuinely a synthesis task. Even there, the LLM output is validated against the retrieved thread, and an LLM-disabled run degrades to "Info" (the check ran but couldn't conclude), not silent pass.

Same pattern applies to draft-assist: the prompt is grounded in the current thread and the user's instruction. Without semantic search available, it falls back to thread-only prompting. Without an LLM, the feature is unavailable, loud and clear.

When the LLM does earn its place

The LLM is the right tool when:

The work is genuine synthesis: summarization, free-text Q&A, extracting structured commitments from unstructured prose
A deterministic version exists but produces too many false positives the LLM can prune
The output is reviewed by a human before any side effect (draft assist, briefings, suggestions)

It is the wrong tool when:

The data could answer it with a query
The work fires on every message, every keystroke, every render
The output drives a side effect with no human in the loop

Deterministic before LLM

The rule and its corollary

Why this is load-bearing

What this looks like in practice

When the LLM does earn its place

See also