Citations Required, Validator Enforces

Citations required, validator enforces

For any LLM-backed synthesis feature, the model must cite the source documents it claims to ground each statement in. And then a validator must reject any output that cites IDs not in the retrieved candidate set. The validator is the load-bearing piece. Without it, the model invents IDs that look plausible.

The two halves of the contract

The prompt requires citations. Strict JSON output, every claim carries an evidence_msg_id or evidence_thread_id. No citation, no claim. The model is allowed to say "I don't have enough evidence"; it is not allowed to make up evidence.

The validator enforces. After the model returns, you walk every claim's citation IDs and check them against the set of message/thread IDs you retrieved and put in the prompt. Any ID that didn't come from your retrieval is a hallucination. The validator rejects the entire response and the feature degrades to "no answer available."

Either half alone fails. Prompt-only "please cite" gets you fabricated IDs that look right. Validator-only without citation requirements has nothing to validate. The pair is what makes the feature trustworthy.

Why this is the bug-stopper

The class of bug it prevents: the model returns a plausible answer with a plausible citation, and the user reads the citation, trusts it, and acts on it. The cited message doesn't actually say what the model claimed. Without validation you'd never know — the ID is in the right shape, the message exists somewhere, the user has no easy way to spot the lie.

Validation makes the lie impossible at the protocol level. If the model cites an ID, the ID came from the retrieval. The model can still misinterpret what the message says, but it can't invent that a message exists.

Adjacent guardrails

The pattern works best when paired with:

Where this generalizes

Anywhere an LLM synthesizes over retrieved context:

The protocol is the same. Retrieval produces a candidate set. The prompt asks for cited answers. The validator gates on "citations must be in the candidate set." When the validator rejects, the feature has to have somewhere honest to go — usually "show the user the retrieval results with no synthesis."

What this rules out

See also