Building Great CLIs

#resources #resources/programming #resources/programming/architecture #resources/programming/cli

Building great CLIs

Synthesis hub. The CLI design philosophy that runs through Mxr (email), Lazydap (debugger), and Spotuify (music). Three projects, three domains, same architecture. This note is the handover doc: if you want to know how I think about CLIs before you build one with me, read this.

The atoms it ties together: Client-Agnostic Cores, The Local Daemon Pattern, Agent-Native Interfaces, Same-Code-Path Preview, Local IPC vs HTTP, Headless Architecture, Headless Core + Multiple Clients, API-First Design. Each is a cluster in its own right; this hub gives you the entry point and the rationale for why they fit together.

The thesis, in one sentence

Build a stable wire protocol owned by a daemon; ship the CLI as one thin client among many (TUI, web bridge, agent skill, future surfaces); make the same --format json and --dry-run work for a human at a terminal and for an agent driving a shell, because they want the same things.

That sentence is the whole memo. Everything below is the why and the how.

Where this thinking comes from

Doug McIlroy, 1978:

Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

Half a century later the universal interface has graduated from text streams to structured text streams. The contract is the same. A CLI that emits stable JSON on stdout, reads JSON on stdin, exits with a meaningful code, and stays quiet on stderr unless something is wrong is a citizen of every pipeline ever written — including the pipelines an LLM agent is going to assemble at 3am while you're asleep.

Mxr arrived at this from email. Lazydap from debugging. Spotuify from music. The instinct is the same and it predates all three. Client-Agnostic Cores traces the lineage all the way back through LSP, the Bezos API mandate, X11, and Unix daemons.

The seven principles

1. The CLI is the product. The TUI is a client.

Most "CLI tools with a TUI" are written TUI-first, with a CLI bolted on later. They have features only the TUI can do. They have selection state the CLI can't see. They have shortcuts that lower into private functions, not the public protocol.

Invert it. The CLI is the canonical surface. The TUI is a viewer over the same protocol. No client is privileged. If a feature exists in the TUI but not the CLI, the feature is incomplete. If the agent skill can do something the CLI can't, the architecture is broken.

This is enforced by the headless discipline applied at single-machine scale. Concrete instances: Mxr's tui crate cannot depend on daemon, store, search, sync, or provider crates — only on protocol. The Cargo dependency graph catches violations at build time. Same rule in Lazydap. Same rule in Spotuify (tests/workspace_boundaries.rs).

Why this matters: every time someone asks "can the TUI just reach into the store directly, it'd be faster" — the answer is no, because the moment one client gets to reach past the protocol, the protocol stops being the product. Then the second client (agent skill, web bridge) is impossible to build cleanly. See Client-Agnostic Cores for the full argument.

2. Build for humans and agents in the same surface

The 2024–2026 dimension. Don't ship an "AI mode" later. Design the CLI so the agent usage is the same as the human usage with one flag flipped.

The discipline that makes a CLI pleasant for humans — composable, scriptable, predictable, idempotent, dry-runnable — is the same discipline that makes it usable by agents. Anthropic's Writing tools for agents and Simon Willison's Designing agentic loops say the same thing from the agent side. Agent-Native Interfaces has the checklist.

Concretely:

--format json available on every output-producing command. Auto-detect non-TTY stdout and default to JSON there.
--dry-run on every mutation, same selection path as the real run.
--yes to skip confirmation prompts in non-interactive contexts.
Stable JSON schemas. Breaking the shape is a major version bump.
Semantic exit codes — 0 success, 1 runtime error, 2 usage error, 3+ domain-specific. Distinguishable.
Errors as structured JSON, not stack traces.
Per-subcommand --help that's still parseable, plus optionally a --schema / manifest subcommand for programmatic discovery.

Hit eight of those on day one and the tool is already in the top decile of agent-friendliness. The bonus is that you also have, accidentally, a tool that's pleasant to script from Bash, Fish, Python, Node, anywhere.

3. Pipeable JSON is a product feature, not a flag

The single most consequential decision: treat the JSON shape as a public API.

Once one consumer (shell pipeline, agent skill, web bridge) depends on the schema, all consumers do. There's no version of "internal JSON output we can break freely." Either the output is contractual or it's noise. Make it contractual.

What this looks like in practice:

Schemas live in docs/blueprint/ or equivalent, with examples.
A --format flag with values table (human default on a TTY), json (machine default off a TTY), ndjson for streaming, text for the legacy unstructured form if you need one.
TTY detection: isatty(stdout) decides the default. --format overrides.
Versioning: include a schema_version field in the JSON envelope. Clients refuse to talk on mismatch.
Auto-detected non-TTY mode also suppresses spinners, colour codes, and progress bars in stdout — they go to stderr or off entirely. stdout is for the result, not for the chrome.

If you need a forcing function: write the second consumer first. Build a Bash one-liner that pipes the JSON through jq and into another command. If that doesn't read naturally, the schema is wrong.

4. The daemon is the brain. The CLI is a courier.

This is the local daemon pattern and it's the load-bearing decision for everything that follows.

A naive CLI re-loads everything per invocation: open the SQLite file, parse config, spin up the async runtime, connect to providers, do work, exit. Easily 200ms per command before any actual work. The user types mxr search "from:alice" and waits a quarter-second for nothing.

A daemon already has all that loaded. The CLI is 5–20ms total: probe the socket, connect, send a request, read the response, exit. Subjectively instant.

Three problems get solved at once:

Cold-start latency. Gone after the first command.
Persistent connections. IMAP IDLE, DAP adapter sessions, file watchers — things a short-lived CLI can't hold. The daemon can.
Multiple concurrent clients. TUI alongside the CLI alongside the agent skill, all consuming the same state, all seeing the same events. Without a daemon you'd need the TUI to be the state-holder, which couples the protocol to the TUI's lifecycle forever.

The CLI never has to know whether the daemon is running. Auto-spawn handles it. First invocation forks the daemon, polls for the socket to appear, sends the request. Second invocation finds the socket already there. The user thinks they're running CLI commands; they're actually opening short-lived sockets to a long-lived background process. See The Local Daemon Pattern for the full mechanic, including PID files, idle shutdown, and crash recovery.

Per-user or per-project? Mxr is per-user — one daemon per logged-in user. Lazydap is per-project — one daemon per repo root. Choose by the scope of the state. Email is a user-level concern; a debug session is a project-level concern.

5. Local IPC over Unix socket, not HTTP-on-loopback, unless you have a browser reason

The default is a Unix domain socket with length-delimited JSON framing. Unix Domain Sockets for the mechanism; Local IPC vs HTTP for the comparison.

Why Unix socket wins for the default case:

Faster. 5–10µs round-trip vs 30–200µs for HTTP loopback. Per-call latency rarely matters; the 1000+/sec language-server-style use case it does.
Safer. chmod 0700 on the socket means owner-only. Filesystem permissions are dramatically better than 127.0.0.1 bind (which is reachable by any process running as any user on the machine).
Simpler wire. No HTTP parsing. No CORS. No keep-alive negotiation. A u32 length prefix and a JSON body.
No accidental network exposure. A socket on disk can't end up bound to 0.0.0.0 by a config typo.

When to reach for HTTP-on-loopback instead:

A browser client is a real consumer. Browsers can't open Unix sockets. (The workaround: ship a small bridge binary that translates HTTP/WebSocket to the local IPC. Mxr does this for its embedded SPA; see Embedded SPA in Daemon Binary for the distribution trick.)
The daemon will eventually be remote-accessible and you want one wire format end-to-end.
Third-party clients are the major use case and you want maximal tooling — curl, Postman, OpenAPI, browser devtools.

For the common case of "I'm shipping the clients alongside the daemon," Unix socket. For the case of "anyone might write a client and I want the tooling ecosystem," HTTP. TCP vs Unix Sockets vs Named Pipes vs Shared Memory has the comparison table.

Windows note: named pipes are the equivalent. \\.\pipe\mxr works the same way semantically, with similar permission semantics via SECURITY_ATTRIBUTES.

6. Mutations are dry-runnable through the same code path

The rule that came out of a real bug: mxr's bulk-archive said "1,200 messages" in preview and archived 12,000 in execution because preview and execute had drifted into separate query paths. Preview was correct for its own (wrong) query. The user trusted the number. A decade of email moved.

Same-Code-Path Preview is the rule that came out of that:

fn run_mutation(args: &Args) -> Result<()> {
    let selection = compute_selection(args)?;
    if args.dry_run {
        print_preview(&selection);
        return Ok(());
    }
    confirm(&selection)?;
    mutate(&selection)
}

Compute the selection once. Render it or execute it. The mutation takes a Selection value, not a query — so the act of mutating cannot expand the set.

Generalises beyond CLIs. Anywhere there's a preview and an execute, a parallel implementation will drift, and you won't see the drift until production data is gone. The frame I keep returning to: preview is execute minus the side effect. Not "similar." Not "approximate." Minus the side effect, full stop.

Every mutation in Mxr, Lazydap, and Spotuify follows this. It's one of the few rules shared across all three.

7. The build system enforces the architecture

The crate dependency graph (Cargo workspace, in our case) is the architecture. Not the diagram. Not the docs. The Cargo.toml files.

The rule in Mxr: tui and web cannot depend on daemon, store, search, sync, or provider crates. Only protocol, core, config, compose, reader, mail-parse. Same in Lazydap (tui cannot reach past protocol). Same in Spotuify (14 crates, boundaries enforced by tests/workspace_boundaries.rs).

When the build catches the violation at compile time, three things happen:

Discussions get short. The conversation isn't "should we?" it's "the build is red."
New contributors hit the wall before they hit the codebase. The rule teaches itself.
The architecture document doesn't decay because the architecture is the build.

In other languages: module visibility in Java/Kotlin, package boundaries with explicit exports in TypeScript, internal/public in C#, build tags in Go. The mechanism varies; the principle doesn't. Don't rely on developer discipline alone.

The shape, drawn

   Human at terminal     TUI (one client)     Agent skill (another client)
        │                    │                          │
        │ (table output)     │ (events + actions)       │ (JSON over stdin/stdout)
        ▼                    │                          │
   CLI subcommands ◄─────────┴──────────────────────────┘
        │
        │  length-prefixed JSON
        │  ┌─────────────────────────────────────┐
        │  │  {                                   │
        │  │    "v": 1,                           │
        │  │    "id": "req_01HXYZ",               │
        │  │    "bucket": "CoreMail",             │
        │  │    "request": { "Search": { ... } }  │
        │  │  }                                   │
        │  └─────────────────────────────────────┘
        ▼
   Unix domain socket  (chmod 0700, ~/.cache/<tool>/<tool>.sock)
        │
        ▼
   Daemon (auto-spawned, long-lived)
        │
        ├── canonical state (SQLite / TOML / whatever survives a crash)
        ├── derived state  (search index, rebuildable)
        ├── adapters       (Gmail / IMAP / SMTP / DAP / Spotify Connect)
        └── sync loops, event bus, subscription channels

   Optional: web bridge (separate binary) ── HTTP+WS ──┐
                                                      │
                                                Same Unix socket

This is the same shape in all three projects. The boxes labelled "adapters" and the verb in the bottom-left ("Gmail / DAP / Spotify Connect") change. The protocol shape, the daemon discipline, the socket location convention, the bucketed message taxonomy — those stay.

When to use this pattern

Use it when:

You have, or expect, more than one consumer. The CLI plus a TUI counts. The CLI plus an agent skill counts. Two is enough.
The interface is going to outlive the UI. Protocols designed for stability survive UI rewrites; UIs do not survive protocol changes.
The cost of cold-start per command would be annoying. Anything > 50ms.
You want agents to drive the tool well. Without the protocol, they can't.

Skip it when:

One consumer, no second one credibly in sight. A monolithic CLI is fine; don't pre-architect a daemon you'll never need.
The tool is truly stateless and per-invocation work is < 50ms. jq, ripgrep, fd — they don't need a daemon.
The interface really is one-shot. A throwaway script doesn't need a wire protocol.

The daemon discipline is overkill for a 200-line xargs replacement. It's the right size for anything that holds state across commands or talks to a remote service.

The recipe (the order I'd build a new one in)

This is the M0–M5 progression I followed in Lazydap, with the lessons from Mxr baked in.

Write the protocol first. Message types, request/response shapes, error model, bucket taxonomy, versioning rule. Markdown is fine; types in code are better. Document before code. API-First Design.
Write a fake adapter that exercises the protocol end-to-end with no backend. Validates the design and gives you a fast integration test substrate forever.
Set up the crate boundaries. core (zero I/O), protocol (the wire), store (canonical state), daemon (orchestrator). Add a boundary test that fails if tui depends on daemon. Do this before there's a TUI, so the rule predates the temptation.
Build the daemon. Auto-spawn from the CLI binary, PID file, Unix socket, length-prefixed JSON, tracing from line 1 of main, structured logs at every IPC boundary.
Build the first CLI subcommand. Something read-only, ideally. End-to-end works.
Build a mutation with --dry-run and --yes. Get Same-Code-Path Preview in before there are five mutations to refactor.
Build the second client. This is the test of the protocol. If the second client requires changes to the daemon, the protocol is wrong, fix it now. The second client can be small — even an agent skill (.skill ZIP plus an AGENTS.md) counts.
Add the TUI. It's a third client, not a privileged surface. Same protocol. Hand-rolled Elm-style state internally (The Elm Architecture).
Lock the load-bearing decisions in a decision-log.md. A "do not relitigate" section saves every future reviewer the same conversation.
Test against real systems, not mocks. Real IMAP via Dovecot fixture, real Spotify via librespot in a sandbox, real DAP adapters. Mocks paper over the bugs that actually bite (label reorderings, IDLE disconnects, OAuth refresh races, codelldb stderr buffer fill).

Failure modes worth pre-empting

These are the ones I've already paid for. Don't pay for them twice.

The "TUI just needs one more thing from the daemon's internals" creep. Once you grant it, the protocol stops being canonical and the second client becomes impossible. Hold the line.
Unstable JSON. Treating --format json as "debug output" until a consumer comes along. By then it's too late; you've broken the schema three times and the consumer can't trust anything.
Preview-execute drift. Don't have two implementations. Don't have a "fast preview" that approximates. One compute_selection, two render paths.
HTTP-on-loopback chosen for vibes, not for browser need. It's not faster, not safer, not simpler. Choose it when you have a browser to talk to, otherwise default to Unix socket.
Mocks instead of real systems for the integration suite. Mocks lie. Real Gmail, real codelldb, real Spotify Connect device. Slow tests beat fast lies.
Forgetting to drain stderr in subprocess plumbing. codelldb (and any other DAP adapter) will silently block on stderr buffer fill if you don't drain it in a separate task. M1 of Lazydap paid this cost.
Bundled OAuth credentials missing. If your tool requires the user to set up a Google Cloud project before they can read their own email, they will not use your tool. Ship sane defaults; let power users override.
Daemon dies during a long operation; client has no resume. Persistent state in durable storage. Live session state can vanish — the new daemon doesn't need to know about the old one because the user's next invocation is fresh anyway.
A new bucket of IPC messages for every new feature. Pick a small set (mxr: CoreMail / MxrPlatform / AdminMaintenance / ClientSpecific) and make every new request justify which bucket it goes in. Adding a bucket is a decision, not a default.

The handover checklist

If I were handing one of these projects to another engineer cold, this is what I'd want them to be able to say yes to before they start changing things:

You've read the project's ARCHITECTURE.md and the decision-log.md.
You can articulate the four message buckets and why each request fits in exactly one.
You've run the CLI against the fake adapter and seen the JSON output.
You've run the CLI against the real backend and seen the same JSON output.
You've watched the daemon auto-spawn on first invocation. You've killed it and watched it auto-respawn.
You can name the crate boundary rule and you've seen the test that enforces it.
You've written a --dry-run for a mutation and verified the selection it prints matches what --yes would touch, byte-for-byte.
You've read Client-Agnostic Cores and you can explain why the TUI is one client and not "the UI."
You've read Agent-Native Interfaces and you know what the JSON contract owes its consumers.

If yes to all, you can start. If no, read the missing one before touching code.

The cluster

The atoms this hub ties together. Each is its own entry point.

Client-Agnostic Cores — the synthesis. Lineage from Unix daemons → X11 → Bezos mandate → LSP → MCP. Why this pattern keeps being rediscovered.
Headless Architecture — the architectural form. Core engine has no built-in UI; consumers all go through the API.
Headless Core + Multiple Clients — the local-tool variant of headless.
The Local Daemon Pattern — auto-spawning per-user / per-project daemon. The CLI feels stateless; the daemon makes it stateful.
Agent-Native Interfaces — the 2024–2026 dimension. Six properties: structured outputs, idempotency, schemas, semantic IDs, capability discovery, non-overlapping tools.
API-First Design — the methodology. Design the protocol before any client.
Same-Code-Path Preview — the dry-run discipline. Preview is execute minus the side effect.
Local IPC vs HTTP — the transport decision tree.
Unix Domain Sockets — the default transport.
TCP vs Unix Sockets vs Named Pipes vs Shared Memory — the comparison table.
How Daemons Work — daemon lifecycle, PID files, signals, IPC server.
Embedded SPA in Daemon Binary — the trick that lets one artifact ship the daemon and the web UI together. One thing to install, no version mismatch.

Concrete instances

Mxr — email. Per-user daemon, SQLite + Tantivy, four IPC buckets, embedded SPA.
Lazydap — debugger. Per-project daemon, TOML state, DAP adapter wrapper, --wait for async-to-sync bridge.
Spotuify — music. Per-user daemon, librespot under the hood, 14-crate workspace, boundaries enforced by a test.

Same architecture. Three domains. The fact that the architecture transferred from email to debugging to music with minor adjustments is the strongest evidence I have that it's the right shape for the class of tool I want to build.

One last thing

If you read this and the instinct that says "this should be a stable protocol with many clients" feels obvious — good. It's supposed to. The instinct is older than software. The fact that it keeps being right across decades is the whole point.

The trap is that on day one of a project, the instinct feels like over-engineering. I only need a CLI. The daemon is overkill. JSON output is a nice-to-have. Then the second consumer appears and you spend a month untangling state from the CLI process. Then the third consumer (an agent, six months from now) is impossible without rebuilding the whole thing.

Pay the cost once, upfront. The protocol, the daemon, the boundaries. After that, every new feature, every new client, every new domain reuses the same shape. The marginal cost of Spotuify after Mxr was small. The marginal cost of Lazydap after Mxr was small. That's the dividend.

Build for the second consumer you can already see, and the third consumer you can't, will arrive cheaply.