Length-Prefixed Framing

Length-Prefixed Framing

A technique for sending discrete messages over a byte stream. Stream protocols (TCP, Unix stream sockets) deliver bytes — no message boundaries. To send messages, you have to mark where each one begins and ends. The simplest, fastest, hardest-to-screw-up approach: prefix each message with its length.

The format

[ 4 bytes: message length N (big-endian) ][ N bytes: message body ]
[ 4 bytes: next message length M         ][ M bytes: next body    ]
...

Reader:

  1. Read exactly 4 bytes. Decode as N.
  2. Read exactly N bytes. That's the message body.
  3. Hand the body to the parser.
  4. Loop.

Writer:

  1. Encode the message body. Get N bytes.
  2. Write 4-byte big-endian N.
  3. Write the N bytes.

That's the entire algorithm. Uses read_exact / write_all — never partial reads, never partial writes. The kernel handles the byte stream; the framing handles message boundaries.

Why "big-endian"

Network byte order is big-endian. The convention from BSD sockets and IETF RFCs. Doesn't matter for local IPC functionally, but matches expectations.

In Rust:

let len_bytes = (body.len() as u32).to_be_bytes();
writer.write_all(&len_bytes).await?;
writer.write_all(&body).await?;

// Reader:
let mut len_buf = [0u8; 4];
reader.read_exact(&mut len_buf).await?;
let len = u32::from_be_bytes(len_buf) as usize;
let mut body = vec![0u8; len];
reader.read_exact(&mut body).await?;

Why 4 bytes

Standard for most use cases:

Variants:

mxr and lazydap use 4-byte big-endian. Standard, simple, sufficient.

Comparison to other framing schemes

Content-Length: N\r\n\r\n headers (LSP / DAP style)

Content-Length: 119\r\n
\r\n
{"jsonrpc":"2.0","id":1,...}

Two \r\n terminate the headers. Body is exactly Content-Length bytes. More verbose than length-prefix but human-readable in transit (great for debugging — open the socket with socat and watch).

DAP uses this. Its Content-Length headers can be inspected with socat UNIX-CONNECT:lazydap.sock STDIO while the daemon talks.

Newline-delimited (\n between messages)

Each message is one line. JSON over newline-delimited is "JSONL" or "ndjson".

Pros: dead simple, human-readable, easy to grep.

Cons: messages can't contain newlines unless escaped. Adds parsing burden.

Sentinel-delimited

Special byte sequence marks message boundaries. Telnet uses IAC (0xFF). SMTP uses \r\n.\r\n for end of DATA.

Cons: messages can't contain the sentinel; usually requires escaping.

Self-describing protocols (gRPC / protobuf)

Protocol Buffers encode messages with internal length headers and tags. Framing is built into the codec. No separate framing layer needed if you use the codec.

Why length-prefix wins for local IPC

For human-readable streams (LSP/DAP), Content-Length is preferred for debuggability. For fast local IPC where you control both sides (mxr, lazydap), length-prefix is the right answer.

Common pitfalls

What mxr and lazydap do

Both:

// crates/protocol/src/codec.rs

pub async fn write_message<W: AsyncWrite + Unpin>(w: &mut W, msg: &IpcMessage) -> io::Result<()> {
    let body = serde_json::to_vec(msg)?;
    let len = (body.len() as u32).to_be_bytes();
    w.write_all(&len).await?;
    w.write_all(&body).await?;
    w.flush().await?;
    Ok(())
}

pub async fn read_message<R: AsyncRead + Unpin>(r: &mut R) -> io::Result<IpcMessage> {
    let mut len_buf = [0u8; 4];
    r.read_exact(&mut len_buf).await?;
    let len = u32::from_be_bytes(len_buf) as usize;
    if len > MAX_MESSAGE_SIZE {
        return ErrInvalidData, "message too large");
    }
    let mut body = vec![0u8; len];
    r.read_exact(&mut body).await?;
    Okfrom_slice(&body)?
}

Both validate length against a 16 MiB cap. Both use serde_json for the body. Both run over Unix Domain Sockets.

See also