Citations Required, Validator Enforces

#resources #resources/programming #resources/programming/architecture

Citations required, validator enforces

For any LLM-backed synthesis feature, the model must cite the source documents it claims to ground each statement in. And then a validator must reject any output that cites IDs not in the retrieved candidate set. The validator is the load-bearing piece. Without it, the model invents IDs that look plausible.

The two halves of the contract

The prompt requires citations. Strict JSON output, every claim carries an evidence_msg_id or evidence_thread_id. No citation, no claim. The model is allowed to say "I don't have enough evidence"; it is not allowed to make up evidence.

The validator enforces. After the model returns, you walk every claim's citation IDs and check them against the set of message/thread IDs you retrieved and put in the prompt. Any ID that didn't come from your retrieval is a hallucination. The validator rejects the entire response and the feature degrades to "no answer available."

Either half alone fails. Prompt-only "please cite" gets you fabricated IDs that look right. Validator-only without citation requirements has nothing to validate. The pair is what makes the feature trustworthy.

Why this is the bug-stopper

The class of bug it prevents: the model returns a plausible answer with a plausible citation, and the user reads the citation, trusts it, and acts on it. The cited message doesn't actually say what the model claimed. Without validation you'd never know — the ID is in the right shape, the message exists somewhere, the user has no easy way to spot the lie.

Validation makes the lie impossible at the protocol level. If the model cites an ID, the ID came from the retrieval. The model can still misinterpret what the message says, but it can't invent that a message exists.

Adjacent guardrails

The pattern works best when paired with:

Filters enforced before the prompt, not by the LLM. Date ranges, sender filters, label filters — apply them to the retrieval, then ask the LLM only about what you handed it. Don't ask the LLM "answer the question, but only consider messages from last week"; it'll cite messages from two years ago and confidently date them as recent.
Explicit "not enough evidence" path. The model must be able to refuse. Forcing an answer guarantees hallucination when the answer isn't in the corpus.
Cache invalidated by content hash, not by time. When the retrieved set changes (new messages, edited content), the cached answer is stale because the citations might no longer hold.

Where this generalizes

Anywhere an LLM synthesizes over retrieved context:

RAG question-answering over docs
Code-review bots citing line ranges
Compliance summarizers citing policy sections
Chat agents citing knowledge base articles

The protocol is the same. Retrieval produces a candidate set. The prompt asks for cited answers. The validator gates on "citations must be in the candidate set." When the validator rejects, the feature has to have somewhere honest to go — usually "show the user the retrieval results with no synthesis."

What this rules out

LLM output trusted on text alone, without checking what it cited
Citation IDs in some opaque "the model knows" namespace, not the retrieval namespace
"We'll add validation later" — synthesis features without validation ship hallucinations into the audit log
Loose citation formats that can't be programmatically checked (free-text "see message X" instead of structured IDs)