Building Great CLIs

Building great CLIs

Synthesis hub. The CLI design philosophy that runs through Mxr (email), Lazydap (debugger), and Spotuify (music). Three projects, three domains, same architecture. This note is the handover doc: if you want to know how I think about CLIs before you build one with me, read this.

The atoms it ties together: Client-Agnostic Cores, The Local Daemon Pattern, Agent-Native Interfaces, Same-Code-Path Preview, Local IPC vs HTTP, Headless Architecture, Headless Core + Multiple Clients, API-First Design. Each is a cluster in its own right; this hub gives you the entry point and the rationale for why they fit together.

The thesis, in one sentence

Build a stable wire protocol owned by a daemon; ship the CLI as one thin client among many (TUI, web bridge, agent skill, future surfaces); make the same --format json and --dry-run work for a human at a terminal and for an agent driving a shell, because they want the same things.

That sentence is the whole memo. Everything below is the why and the how.

Where this thinking comes from

Doug McIlroy, 1978:

Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

Half a century later the universal interface has graduated from text streams to structured text streams. The contract is the same. A CLI that emits stable JSON on stdout, reads JSON on stdin, exits with a meaningful code, and stays quiet on stderr unless something is wrong is a citizen of every pipeline ever written — including the pipelines an LLM agent is going to assemble at 3am while you're asleep.

Mxr arrived at this from email. Lazydap from debugging. Spotuify from music. The instinct is the same and it predates all three. Client-Agnostic Cores traces the lineage all the way back through LSP, the Bezos API mandate, X11, and Unix daemons.

The seven principles

1. The CLI is the product. The TUI is a client.

Most "CLI tools with a TUI" are written TUI-first, with a CLI bolted on later. They have features only the TUI can do. They have selection state the CLI can't see. They have shortcuts that lower into private functions, not the public protocol.

Invert it. The CLI is the canonical surface. The TUI is a viewer over the same protocol. No client is privileged. If a feature exists in the TUI but not the CLI, the feature is incomplete. If the agent skill can do something the CLI can't, the architecture is broken.

This is enforced by the headless discipline applied at single-machine scale. Concrete instances: Mxr's tui crate cannot depend on daemon, store, search, sync, or provider crates — only on protocol. The Cargo dependency graph catches violations at build time. Same rule in Lazydap. Same rule in Spotuify (tests/workspace_boundaries.rs).

Why this matters: every time someone asks "can the TUI just reach into the store directly, it'd be faster" — the answer is no, because the moment one client gets to reach past the protocol, the protocol stops being the product. Then the second client (agent skill, web bridge) is impossible to build cleanly. See Client-Agnostic Cores for the full argument.

2. Build for humans and agents in the same surface

The 2024–2026 dimension. Don't ship an "AI mode" later. Design the CLI so the agent usage is the same as the human usage with one flag flipped.

The discipline that makes a CLI pleasant for humans — composable, scriptable, predictable, idempotent, dry-runnable — is the same discipline that makes it usable by agents. Anthropic's Writing tools for agents and Simon Willison's Designing agentic loops say the same thing from the agent side. Agent-Native Interfaces has the checklist.

Concretely:

Hit eight of those on day one and the tool is already in the top decile of agent-friendliness. The bonus is that you also have, accidentally, a tool that's pleasant to script from Bash, Fish, Python, Node, anywhere.

3. Pipeable JSON is a product feature, not a flag

The single most consequential decision: treat the JSON shape as a public API.

Once one consumer (shell pipeline, agent skill, web bridge) depends on the schema, all consumers do. There's no version of "internal JSON output we can break freely." Either the output is contractual or it's noise. Make it contractual.

What this looks like in practice:

If you need a forcing function: write the second consumer first. Build a Bash one-liner that pipes the JSON through jq and into another command. If that doesn't read naturally, the schema is wrong.

4. The daemon is the brain. The CLI is a courier.

This is the local daemon pattern and it's the load-bearing decision for everything that follows.

A naive CLI re-loads everything per invocation: open the SQLite file, parse config, spin up the async runtime, connect to providers, do work, exit. Easily 200ms per command before any actual work. The user types mxr search "from:alice" and waits a quarter-second for nothing.

A daemon already has all that loaded. The CLI is 5–20ms total: probe the socket, connect, send a request, read the response, exit. Subjectively instant.

Three problems get solved at once:

  1. Cold-start latency. Gone after the first command.
  2. Persistent connections. IMAP IDLE, DAP adapter sessions, file watchers — things a short-lived CLI can't hold. The daemon can.
  3. Multiple concurrent clients. TUI alongside the CLI alongside the agent skill, all consuming the same state, all seeing the same events. Without a daemon you'd need the TUI to be the state-holder, which couples the protocol to the TUI's lifecycle forever.

The CLI never has to know whether the daemon is running. Auto-spawn handles it. First invocation forks the daemon, polls for the socket to appear, sends the request. Second invocation finds the socket already there. The user thinks they're running CLI commands; they're actually opening short-lived sockets to a long-lived background process. See The Local Daemon Pattern for the full mechanic, including PID files, idle shutdown, and crash recovery.

Per-user or per-project? Mxr is per-user — one daemon per logged-in user. Lazydap is per-project — one daemon per repo root. Choose by the scope of the state. Email is a user-level concern; a debug session is a project-level concern.

5. Local IPC over Unix socket, not HTTP-on-loopback, unless you have a browser reason

The default is a Unix domain socket with length-delimited JSON framing. Unix Domain Sockets for the mechanism; Local IPC vs HTTP for the comparison.

Why Unix socket wins for the default case:

When to reach for HTTP-on-loopback instead:

For the common case of "I'm shipping the clients alongside the daemon," Unix socket. For the case of "anyone might write a client and I want the tooling ecosystem," HTTP. TCP vs Unix Sockets vs Named Pipes vs Shared Memory has the comparison table.

Windows note: named pipes are the equivalent. \\.\pipe\mxr works the same way semantically, with similar permission semantics via SECURITY_ATTRIBUTES.

6. Mutations are dry-runnable through the same code path

The rule that came out of a real bug: mxr's bulk-archive said "1,200 messages" in preview and archived 12,000 in execution because preview and execute had drifted into separate query paths. Preview was correct for its own (wrong) query. The user trusted the number. A decade of email moved.

Same-Code-Path Preview is the rule that came out of that:

fn run_mutation(args: &Args) -> Result<()> {
    let selection = compute_selection(args)?;
    if args.dry_run {
        print_preview(&selection);
        return Ok(());
    }
    confirm(&selection)?;
    mutate(&selection)
}

Compute the selection once. Render it or execute it. The mutation takes a Selection value, not a query — so the act of mutating cannot expand the set.

Generalises beyond CLIs. Anywhere there's a preview and an execute, a parallel implementation will drift, and you won't see the drift until production data is gone. The frame I keep returning to: preview is execute minus the side effect. Not "similar." Not "approximate." Minus the side effect, full stop.

Every mutation in Mxr, Lazydap, and Spotuify follows this. It's one of the few rules shared across all three.

7. The build system enforces the architecture

The crate dependency graph (Cargo workspace, in our case) is the architecture. Not the diagram. Not the docs. The Cargo.toml files.

The rule in Mxr: tui and web cannot depend on daemon, store, search, sync, or provider crates. Only protocol, core, config, compose, reader, mail-parse. Same in Lazydap (tui cannot reach past protocol). Same in Spotuify (14 crates, boundaries enforced by tests/workspace_boundaries.rs).

When the build catches the violation at compile time, three things happen:

  1. Discussions get short. The conversation isn't "should we?" it's "the build is red."
  2. New contributors hit the wall before they hit the codebase. The rule teaches itself.
  3. The architecture document doesn't decay because the architecture is the build.

In other languages: module visibility in Java/Kotlin, package boundaries with explicit exports in TypeScript, internal/public in C#, build tags in Go. The mechanism varies; the principle doesn't. Don't rely on developer discipline alone.

The shape, drawn

   Human at terminal     TUI (one client)     Agent skill (another client)
        │                    │                          │
        │ (table output)     │ (events + actions)       │ (JSON over stdin/stdout)
        ▼                    │                          │
   CLI subcommands ◄─────────┴──────────────────────────┘
        │
        │  length-prefixed JSON
        │  ┌─────────────────────────────────────┐
        │  │  {                                   │
        │  │    "v": 1,                           │
        │  │    "id": "req_01HXYZ",               │
        │  │    "bucket": "CoreMail",             │
        │  │    "request": { "Search": { ... } }  │
        │  │  }                                   │
        │  └─────────────────────────────────────┘
        ▼
   Unix domain socket  (chmod 0700, ~/.cache/<tool>/<tool>.sock)
        │
        ▼
   Daemon (auto-spawned, long-lived)
        │
        ├── canonical state (SQLite / TOML / whatever survives a crash)
        ├── derived state  (search index, rebuildable)
        ├── adapters       (Gmail / IMAP / SMTP / DAP / Spotify Connect)
        └── sync loops, event bus, subscription channels

   Optional: web bridge (separate binary) ── HTTP+WS ──┐
                                                      │
                                                Same Unix socket

This is the same shape in all three projects. The boxes labelled "adapters" and the verb in the bottom-left ("Gmail / DAP / Spotify Connect") change. The protocol shape, the daemon discipline, the socket location convention, the bucketed message taxonomy — those stay.

When to use this pattern

Use it when:

Skip it when:

The daemon discipline is overkill for a 200-line xargs replacement. It's the right size for anything that holds state across commands or talks to a remote service.

The recipe (the order I'd build a new one in)

This is the M0–M5 progression I followed in Lazydap, with the lessons from Mxr baked in.

  1. Write the protocol first. Message types, request/response shapes, error model, bucket taxonomy, versioning rule. Markdown is fine; types in code are better. Document before code. API-First Design.
  2. Write a fake adapter that exercises the protocol end-to-end with no backend. Validates the design and gives you a fast integration test substrate forever.
  3. Set up the crate boundaries. core (zero I/O), protocol (the wire), store (canonical state), daemon (orchestrator). Add a boundary test that fails if tui depends on daemon. Do this before there's a TUI, so the rule predates the temptation.
  4. Build the daemon. Auto-spawn from the CLI binary, PID file, Unix socket, length-prefixed JSON, tracing from line 1 of main, structured logs at every IPC boundary.
  5. Build the first CLI subcommand. Something read-only, ideally. End-to-end works.
  6. Build a mutation with --dry-run and --yes. Get Same-Code-Path Preview in before there are five mutations to refactor.
  7. Build the second client. This is the test of the protocol. If the second client requires changes to the daemon, the protocol is wrong, fix it now. The second client can be small — even an agent skill (.skill ZIP plus an AGENTS.md) counts.
  8. Add the TUI. It's a third client, not a privileged surface. Same protocol. Hand-rolled Elm-style state internally (The Elm Architecture).
  9. Lock the load-bearing decisions in a decision-log.md. A "do not relitigate" section saves every future reviewer the same conversation.
  10. Test against real systems, not mocks. Real IMAP via Dovecot fixture, real Spotify via librespot in a sandbox, real DAP adapters. Mocks paper over the bugs that actually bite (label reorderings, IDLE disconnects, OAuth refresh races, codelldb stderr buffer fill).

Failure modes worth pre-empting

These are the ones I've already paid for. Don't pay for them twice.

The handover checklist

If I were handing one of these projects to another engineer cold, this is what I'd want them to be able to say yes to before they start changing things:

If yes to all, you can start. If no, read the missing one before touching code.

The cluster

The atoms this hub ties together. Each is its own entry point.

Concrete instances

Same architecture. Three domains. The fact that the architecture transferred from email to debugging to music with minor adjustments is the strongest evidence I have that it's the right shape for the class of tool I want to build.

Further reading worth your time

One last thing

If you read this and the instinct that says "this should be a stable protocol with many clients" feels obvious — good. It's supposed to. The instinct is older than software. The fact that it keeps being right across decades is the whole point.

The trap is that on day one of a project, the instinct feels like over-engineering. I only need a CLI. The daemon is overkill. JSON output is a nice-to-have. Then the second consumer appears and you spend a month untangling state from the CLI process. Then the third consumer (an agent, six months from now) is impossible without rebuilding the whole thing.

Pay the cost once, upfront. The protocol, the daemon, the boundaries. After that, every new feature, every new client, every new domain reuses the same shape. The marginal cost of Spotuify after Mxr was small. The marginal cost of Lazydap after Mxr was small. That's the dividend.

Build for the second consumer you can already see, and the third consumer you can't, will arrive cheaply.