Building Great CLIs
Building great CLIs
Synthesis hub. The CLI design philosophy that runs through Mxr (email), Lazydap (debugger), and Spotuify (music). Three projects, three domains, same architecture. This note is the handover doc: if you want to know how I think about CLIs before you build one with me, read this.
The atoms it ties together: Client-Agnostic Cores, The Local Daemon Pattern, Agent-Native Interfaces, Same-Code-Path Preview, Local IPC vs HTTP, Headless Architecture, Headless Core + Multiple Clients, API-First Design. Each is a cluster in its own right; this hub gives you the entry point and the rationale for why they fit together.
The thesis, in one sentence
Build a stable wire protocol owned by a daemon; ship the CLI as one thin client among many (TUI, web bridge, agent skill, future surfaces); make the same --format json and --dry-run work for a human at a terminal and for an agent driving a shell, because they want the same things.
That sentence is the whole memo. Everything below is the why and the how.
Where this thinking comes from
Doug McIlroy, 1978:
Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.
Half a century later the universal interface has graduated from text streams to structured text streams. The contract is the same. A CLI that emits stable JSON on stdout, reads JSON on stdin, exits with a meaningful code, and stays quiet on stderr unless something is wrong is a citizen of every pipeline ever written — including the pipelines an LLM agent is going to assemble at 3am while you're asleep.
Mxr arrived at this from email. Lazydap from debugging. Spotuify from music. The instinct is the same and it predates all three. Client-Agnostic Cores traces the lineage all the way back through LSP, the Bezos API mandate, X11, and Unix daemons.
The seven principles
1. The CLI is the product. The TUI is a client.
Most "CLI tools with a TUI" are written TUI-first, with a CLI bolted on later. They have features only the TUI can do. They have selection state the CLI can't see. They have shortcuts that lower into private functions, not the public protocol.
Invert it. The CLI is the canonical surface. The TUI is a viewer over the same protocol. No client is privileged. If a feature exists in the TUI but not the CLI, the feature is incomplete. If the agent skill can do something the CLI can't, the architecture is broken.
This is enforced by the headless discipline applied at single-machine scale. Concrete instances: Mxr's tui crate cannot depend on daemon, store, search, sync, or provider crates — only on protocol. The Cargo dependency graph catches violations at build time. Same rule in Lazydap. Same rule in Spotuify (tests/workspace_boundaries.rs).
Why this matters: every time someone asks "can the TUI just reach into the store directly, it'd be faster" — the answer is no, because the moment one client gets to reach past the protocol, the protocol stops being the product. Then the second client (agent skill, web bridge) is impossible to build cleanly. See Client-Agnostic Cores for the full argument.
2. Build for humans and agents in the same surface
The 2024–2026 dimension. Don't ship an "AI mode" later. Design the CLI so the agent usage is the same as the human usage with one flag flipped.
The discipline that makes a CLI pleasant for humans — composable, scriptable, predictable, idempotent, dry-runnable — is the same discipline that makes it usable by agents. Anthropic's Writing tools for agents and Simon Willison's Designing agentic loops say the same thing from the agent side. Agent-Native Interfaces has the checklist.
Concretely:
--format jsonavailable on every output-producing command. Auto-detect non-TTYstdoutand default to JSON there.--dry-runon every mutation, same selection path as the real run.--yesto skip confirmation prompts in non-interactive contexts.- Stable JSON schemas. Breaking the shape is a major version bump.
- Semantic exit codes —
0success,1runtime error,2usage error,3+domain-specific. Distinguishable. - Errors as structured JSON, not stack traces.
- Per-subcommand
--helpthat's still parseable, plus optionally a--schema/manifestsubcommand for programmatic discovery.
Hit eight of those on day one and the tool is already in the top decile of agent-friendliness. The bonus is that you also have, accidentally, a tool that's pleasant to script from Bash, Fish, Python, Node, anywhere.
3. Pipeable JSON is a product feature, not a flag
The single most consequential decision: treat the JSON shape as a public API.
Once one consumer (shell pipeline, agent skill, web bridge) depends on the schema, all consumers do. There's no version of "internal JSON output we can break freely." Either the output is contractual or it's noise. Make it contractual.
What this looks like in practice:
- Schemas live in
docs/blueprint/or equivalent, with examples. - A
--formatflag with valuestable(human default on a TTY),json(machine default off a TTY),ndjsonfor streaming,textfor the legacy unstructured form if you need one. - TTY detection:
isatty(stdout)decides the default.--formatoverrides. - Versioning: include a
schema_versionfield in the JSON envelope. Clients refuse to talk on mismatch. - Auto-detected non-TTY mode also suppresses spinners, colour codes, and progress bars in
stdout— they go tostderror off entirely.stdoutis for the result, not for the chrome.
If you need a forcing function: write the second consumer first. Build a Bash one-liner that pipes the JSON through jq and into another command. If that doesn't read naturally, the schema is wrong.
4. The daemon is the brain. The CLI is a courier.
This is the local daemon pattern and it's the load-bearing decision for everything that follows.
A naive CLI re-loads everything per invocation: open the SQLite file, parse config, spin up the async runtime, connect to providers, do work, exit. Easily 200ms per command before any actual work. The user types mxr search "from:alice" and waits a quarter-second for nothing.
A daemon already has all that loaded. The CLI is 5–20ms total: probe the socket, connect, send a request, read the response, exit. Subjectively instant.
Three problems get solved at once:
- Cold-start latency. Gone after the first command.
- Persistent connections. IMAP IDLE, DAP adapter sessions, file watchers — things a short-lived CLI can't hold. The daemon can.
- Multiple concurrent clients. TUI alongside the CLI alongside the agent skill, all consuming the same state, all seeing the same events. Without a daemon you'd need the TUI to be the state-holder, which couples the protocol to the TUI's lifecycle forever.
The CLI never has to know whether the daemon is running. Auto-spawn handles it. First invocation forks the daemon, polls for the socket to appear, sends the request. Second invocation finds the socket already there. The user thinks they're running CLI commands; they're actually opening short-lived sockets to a long-lived background process. See The Local Daemon Pattern for the full mechanic, including PID files, idle shutdown, and crash recovery.
Per-user or per-project? Mxr is per-user — one daemon per logged-in user. Lazydap is per-project — one daemon per repo root. Choose by the scope of the state. Email is a user-level concern; a debug session is a project-level concern.
5. Local IPC over Unix socket, not HTTP-on-loopback, unless you have a browser reason
The default is a Unix domain socket with length-delimited JSON framing. Unix Domain Sockets for the mechanism; Local IPC vs HTTP for the comparison.
Why Unix socket wins for the default case:
- Faster. 5–10µs round-trip vs 30–200µs for HTTP loopback. Per-call latency rarely matters; the 1000+/sec language-server-style use case it does.
- Safer.
chmod 0700on the socket means owner-only. Filesystem permissions are dramatically better than127.0.0.1bind (which is reachable by any process running as any user on the machine). - Simpler wire. No HTTP parsing. No CORS. No keep-alive negotiation. A
u32length prefix and a JSON body. - No accidental network exposure. A socket on disk can't end up bound to
0.0.0.0by a config typo.
When to reach for HTTP-on-loopback instead:
- A browser client is a real consumer. Browsers can't open Unix sockets. (The workaround: ship a small
bridgebinary that translates HTTP/WebSocket to the local IPC. Mxr does this for its embedded SPA; see Embedded SPA in Daemon Binary for the distribution trick.) - The daemon will eventually be remote-accessible and you want one wire format end-to-end.
- Third-party clients are the major use case and you want maximal tooling —
curl, Postman, OpenAPI, browser devtools.
For the common case of "I'm shipping the clients alongside the daemon," Unix socket. For the case of "anyone might write a client and I want the tooling ecosystem," HTTP. TCP vs Unix Sockets vs Named Pipes vs Shared Memory has the comparison table.
Windows note: named pipes are the equivalent. \\.\pipe\mxr works the same way semantically, with similar permission semantics via SECURITY_ATTRIBUTES.
6. Mutations are dry-runnable through the same code path
The rule that came out of a real bug: mxr's bulk-archive said "1,200 messages" in preview and archived 12,000 in execution because preview and execute had drifted into separate query paths. Preview was correct for its own (wrong) query. The user trusted the number. A decade of email moved.
Same-Code-Path Preview is the rule that came out of that:
fn run_mutation(args: &Args) -> Result<()> {
let selection = compute_selection(args)?;
if args.dry_run {
print_preview(&selection);
return Ok(());
}
confirm(&selection)?;
mutate(&selection)
}
Compute the selection once. Render it or execute it. The mutation takes a Selection value, not a query — so the act of mutating cannot expand the set.
Generalises beyond CLIs. Anywhere there's a preview and an execute, a parallel implementation will drift, and you won't see the drift until production data is gone. The frame I keep returning to: preview is execute minus the side effect. Not "similar." Not "approximate." Minus the side effect, full stop.
Every mutation in Mxr, Lazydap, and Spotuify follows this. It's one of the few rules shared across all three.
7. The build system enforces the architecture
The crate dependency graph (Cargo workspace, in our case) is the architecture. Not the diagram. Not the docs. The Cargo.toml files.
The rule in Mxr: tui and web cannot depend on daemon, store, search, sync, or provider crates. Only protocol, core, config, compose, reader, mail-parse. Same in Lazydap (tui cannot reach past protocol). Same in Spotuify (14 crates, boundaries enforced by tests/workspace_boundaries.rs).
When the build catches the violation at compile time, three things happen:
- Discussions get short. The conversation isn't "should we?" it's "the build is red."
- New contributors hit the wall before they hit the codebase. The rule teaches itself.
- The architecture document doesn't decay because the architecture is the build.
In other languages: module visibility in Java/Kotlin, package boundaries with explicit exports in TypeScript, internal/public in C#, build tags in Go. The mechanism varies; the principle doesn't. Don't rely on developer discipline alone.
The shape, drawn
Human at terminal TUI (one client) Agent skill (another client)
│ │ │
│ (table output) │ (events + actions) │ (JSON over stdin/stdout)
▼ │ │
CLI subcommands ◄─────────┴──────────────────────────┘
│
│ length-prefixed JSON
│ ┌─────────────────────────────────────┐
│ │ { │
│ │ "v": 1, │
│ │ "id": "req_01HXYZ", │
│ │ "bucket": "CoreMail", │
│ │ "request": { "Search": { ... } } │
│ │ } │
│ └─────────────────────────────────────┘
▼
Unix domain socket (chmod 0700, ~/.cache/<tool>/<tool>.sock)
│
▼
Daemon (auto-spawned, long-lived)
│
├── canonical state (SQLite / TOML / whatever survives a crash)
├── derived state (search index, rebuildable)
├── adapters (Gmail / IMAP / SMTP / DAP / Spotify Connect)
└── sync loops, event bus, subscription channels
Optional: web bridge (separate binary) ── HTTP+WS ──┐
│
Same Unix socket
This is the same shape in all three projects. The boxes labelled "adapters" and the verb in the bottom-left ("Gmail / DAP / Spotify Connect") change. The protocol shape, the daemon discipline, the socket location convention, the bucketed message taxonomy — those stay.
When to use this pattern
Use it when:
- You have, or expect, more than one consumer. The CLI plus a TUI counts. The CLI plus an agent skill counts. Two is enough.
- The interface is going to outlive the UI. Protocols designed for stability survive UI rewrites; UIs do not survive protocol changes.
- The cost of cold-start per command would be annoying. Anything > 50ms.
- You want agents to drive the tool well. Without the protocol, they can't.
Skip it when:
- One consumer, no second one credibly in sight. A monolithic CLI is fine; don't pre-architect a daemon you'll never need.
- The tool is truly stateless and per-invocation work is < 50ms.
jq,ripgrep,fd— they don't need a daemon. - The interface really is one-shot. A throwaway script doesn't need a wire protocol.
The daemon discipline is overkill for a 200-line xargs replacement. It's the right size for anything that holds state across commands or talks to a remote service.
The recipe (the order I'd build a new one in)
This is the M0–M5 progression I followed in Lazydap, with the lessons from Mxr baked in.
- Write the protocol first. Message types, request/response shapes, error model, bucket taxonomy, versioning rule. Markdown is fine; types in code are better. Document before code. API-First Design.
- Write a fake adapter that exercises the protocol end-to-end with no backend. Validates the design and gives you a fast integration test substrate forever.
- Set up the crate boundaries.
core(zero I/O),protocol(the wire),store(canonical state),daemon(orchestrator). Add a boundary test that fails iftuidepends ondaemon. Do this before there's a TUI, so the rule predates the temptation. - Build the daemon. Auto-spawn from the CLI binary, PID file, Unix socket, length-prefixed JSON, tracing from line 1 of
main, structured logs at every IPC boundary. - Build the first CLI subcommand. Something read-only, ideally. End-to-end works.
- Build a mutation with
--dry-runand--yes. Get Same-Code-Path Preview in before there are five mutations to refactor. - Build the second client. This is the test of the protocol. If the second client requires changes to the daemon, the protocol is wrong, fix it now. The second client can be small — even an agent skill (
.skillZIP plus anAGENTS.md) counts. - Add the TUI. It's a third client, not a privileged surface. Same protocol. Hand-rolled Elm-style state internally (The Elm Architecture).
- Lock the load-bearing decisions in a
decision-log.md. A "do not relitigate" section saves every future reviewer the same conversation. - Test against real systems, not mocks. Real IMAP via Dovecot fixture, real Spotify via librespot in a sandbox, real DAP adapters. Mocks paper over the bugs that actually bite (label reorderings, IDLE disconnects, OAuth refresh races, codelldb stderr buffer fill).
Failure modes worth pre-empting
These are the ones I've already paid for. Don't pay for them twice.
- The "TUI just needs one more thing from the daemon's internals" creep. Once you grant it, the protocol stops being canonical and the second client becomes impossible. Hold the line.
- Unstable JSON. Treating
--format jsonas "debug output" until a consumer comes along. By then it's too late; you've broken the schema three times and the consumer can't trust anything. - Preview-execute drift. Don't have two implementations. Don't have a "fast preview" that approximates. One
compute_selection, two render paths. - HTTP-on-loopback chosen for vibes, not for browser need. It's not faster, not safer, not simpler. Choose it when you have a browser to talk to, otherwise default to Unix socket.
- Mocks instead of real systems for the integration suite. Mocks lie. Real Gmail, real codelldb, real Spotify Connect device. Slow tests beat fast lies.
- Forgetting to drain
stderrin subprocess plumbing. codelldb (and any other DAP adapter) will silently block on stderr buffer fill if you don't drain it in a separate task. M1 of Lazydap paid this cost. - Bundled OAuth credentials missing. If your tool requires the user to set up a Google Cloud project before they can read their own email, they will not use your tool. Ship sane defaults; let power users override.
- Daemon dies during a long operation; client has no resume. Persistent state in durable storage. Live session state can vanish — the new daemon doesn't need to know about the old one because the user's next invocation is fresh anyway.
- A new bucket of IPC messages for every new feature. Pick a small set (mxr: CoreMail / MxrPlatform / AdminMaintenance / ClientSpecific) and make every new request justify which bucket it goes in. Adding a bucket is a decision, not a default.
The handover checklist
If I were handing one of these projects to another engineer cold, this is what I'd want them to be able to say yes to before they start changing things:
If yes to all, you can start. If no, read the missing one before touching code.
The cluster
The atoms this hub ties together. Each is its own entry point.
- Client-Agnostic Cores — the synthesis. Lineage from Unix daemons → X11 → Bezos mandate → LSP → MCP. Why this pattern keeps being rediscovered.
- Headless Architecture — the architectural form. Core engine has no built-in UI; consumers all go through the API.
- Headless Core + Multiple Clients — the local-tool variant of headless.
- The Local Daemon Pattern — auto-spawning per-user / per-project daemon. The CLI feels stateless; the daemon makes it stateful.
- Agent-Native Interfaces — the 2024–2026 dimension. Six properties: structured outputs, idempotency, schemas, semantic IDs, capability discovery, non-overlapping tools.
- API-First Design — the methodology. Design the protocol before any client.
- Same-Code-Path Preview — the dry-run discipline. Preview is execute minus the side effect.
- Local IPC vs HTTP — the transport decision tree.
- Unix Domain Sockets — the default transport.
- TCP vs Unix Sockets vs Named Pipes vs Shared Memory — the comparison table.
- How Daemons Work — daemon lifecycle, PID files, signals, IPC server.
- Embedded SPA in Daemon Binary — the trick that lets one artifact ship the daemon and the web UI together. One thing to install, no version mismatch.
Concrete instances
- Mxr — email. Per-user daemon, SQLite + Tantivy, four IPC buckets, embedded SPA.
- Lazydap — debugger. Per-project daemon, TOML state, DAP adapter wrapper,
--waitfor async-to-sync bridge. - Spotuify — music. Per-user daemon, librespot under the hood, 14-crate workspace, boundaries enforced by a test.
Same architecture. Three domains. The fact that the architecture transferred from email to debugging to music with minor adjustments is the strongest evidence I have that it's the right shape for the class of tool I want to build.
Further reading worth your time
- Command Line Interface Guidelines — the modern Unix CLI bible.
- Anthropic, Writing tools for agents — every tool justifies its existence, meaningful namespacing, high-signal low-token responses.
- Simon Willison, Designing agentic loops — why shell-first beats MCP-first for coding agents.
- InfoQ, Patterns for AI Agent Driven CLIs — JSON output flags, semantic exit codes, structured outputs as stable contracts.
- Worse is Better, Richard Gabriel — the philosophical bedrock for "ship the simple wire format."
- LSP Protocol History — the canonical "core + many clients" success story in our lifetime.
One last thing
If you read this and the instinct that says "this should be a stable protocol with many clients" feels obvious — good. It's supposed to. The instinct is older than software. The fact that it keeps being right across decades is the whole point.
The trap is that on day one of a project, the instinct feels like over-engineering. I only need a CLI. The daemon is overkill. JSON output is a nice-to-have. Then the second consumer appears and you spend a month untangling state from the CLI process. Then the third consumer (an agent, six months from now) is impossible without rebuilding the whole thing.
Pay the cost once, upfront. The protocol, the daemon, the boundaries. After that, every new feature, every new client, every new domain reuses the same shape. The marginal cost of Spotuify after Mxr was small. The marginal cost of Lazydap after Mxr was small. That's the dividend.
Build for the second consumer you can already see, and the third consumer you can't, will arrive cheaply.