MIME
MIME
Multipurpose Internet Mail Extensions. The format that gives email rich content: text + HTML alternatives, attachments, non-ASCII characters, multilingual headers. Defined in RFCs 2045–2049 (1996); RFC 5322 covers the basic message format MIME extends.
Without MIME, email would be plain ASCII text only. With it, you get inboxes that look like inboxes.
The basic structure
An email is a list of headers, a blank line, then a body. The body is either a single piece of content or a tree of parts ("multipart"), recursively.
From: alice@example.com
To: bob@example.com
Subject: hello
Date: Mon, 1 May 2026 09:00:00 +0000
Message-ID: <abc123@example.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="boundary42"
--boundary42
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Hello, Bob!
--boundary42
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
<p>Hello, =E2=9C=A8 Bob!</p>
--boundary42--
Two parts (text and HTML alternatives) under a multipart/alternative container. Boundaries delimit parts. Each part has its own headers describing how its bytes are encoded.
Content types
Top-level MIME types:
- text/plain, text/html, text/markdown
- image/jpeg, image/png, image/svg+xml
- audio/mpeg, video/mp4
- application/pdf, application/json, application/octet-stream
- multipart/alternative — same content, different formats, render the most appropriate
- multipart/mixed — different content, render all (e.g., text + attachments)
- multipart/related — content with embedded resources (HTML email with inline images)
- multipart/signed, multipart/encrypted — for S/MIME / PGP
Inboxes typically nest: multipart/mixed → multipart/alternative (text + HTML) + attachments.
Content-Transfer-Encoding
Email infrastructure historically only handled 7-bit ASCII. MIME ships binary content by encoding it:
- 7bit — straight ASCII, no encoding
- 8bit — UTF-8 (or Latin-1) raw; only safe over 8BITMIME-capable transport
- base64 — 6 bits → ASCII char. Standard for binary attachments.
- quoted-printable — mostly ASCII with
=XXhex escapes for non-ASCII bytes. Standard for HTML email. - binary — rare; only over BINARYMIME transport.
A modern parser handles all of them transparently.
Header encoding (RFC 2047)
Email headers must be ASCII at the wire level. Non-ASCII text in headers (UTF-8 subjects, names with accents) is encoded with the encoded-word syntax:
Subject: =?UTF-8?Q?Hello_=E2=9C=A8_Bob!?=
From: =?UTF-8?B?44GT44KT44Gr44Gh44Gv?= <user@example.com>
Format: =?<charset>?<encoding>?<encoded-text>?= where encoding is Q (quoted-printable) or B (base64).
Decoding is per-fragment; one header may have multiple encoded-words mixed with ASCII.
Address parsing
From, To, Cc, Bcc, Reply-To use a structured format. Examples:
To: alice@example.com
To: Alice Aardvark <alice@example.com>
To: "Aardvark, Alice" <alice@example.com>, bob@example.com, "Bob B" <bob@example.com>
Cc: alice@example.com, group:bob@example.com,carol@example.com;
Display names may be quoted (for special chars). Multiple addresses comma-separated. Group syntax (name:addr1,addr2;) is rare but spec-compliant.
Parsing is non-trivial — RFC 5322 grammar is rich. Use a library; don't roll your own.
Date format
Date: Mon, 01 May 2026 09:00:00 +0000
Date: 1 May 2026 09:00:00 GMT
Date: Mon, 01 May 26 09:00 +0100
RFC 5322 §3.3 defines the syntax. Edge cases include obsolete year formats (26 instead of 2026), missing day-of-week, named timezones (PST, EST — ambiguous), and just plain lies (servers with wrong clocks).
For threading and sorting, mail clients usually fall back to delivery time (e.g., Gmail's internalDate) when parsing fails or the date is implausible.
What mxr does
Per Mxr's crates/mail-parse/src/lib.rs:
- Library:
mail_parser(RFC 5322 + MIME). - One call:
MessageParser::default().parse(raw_message). - All decoding (encoded-words, base64, quoted-printable, charset normalisation) happens inside the library.
- mxr extracts: addresses (
from,to,cc), subject (decoded), date (with fallback toUtc::now()if unparseable), Message-ID, In-Reply-To, References (for threading), text+html bodies, attachment metadata. - Encrypted/signed parts detected by filename/MIME type (
.p7m,.pgp,.gpg,application/pkcs7-mime).
Common pitfalls
- Charset confusion — old emails may declare one charset and contain another. Decoders need to handle mismatch gracefully (mojibake or decode as best-effort).
- Malformed multiparts — missing boundary, unclosed parts, nested incorrectly. Real-world parsers are forgiving.
- HTML email with tracking pixels and remote content — CSS, web fonts, images that ping the sender. mxr blocks remote content by default in reader mode.
- PGP/S/MIME — signature verification and decryption are separate concerns; mxr displays an indicator but doesn't decrypt by default.
- Attachments with weird filenames — RFC 2231 allows multi-line, charset-aware filenames. Real implementations vary.
See also
- SMTP — what carries MIME messages
- IMAP — what fetches them
- Email Threading — uses MIME-decoded headers
- How Email Actually Works — synthesis
- Mxr — concrete implementation
- RFCs 2045–2049 (MIME): start at https://datatracker.ietf.org/doc/html/rfc2045
- RFC 5322 (Internet Message Format): https://datatracker.ietf.org/doc/html/rfc5322
- RFC 2047 (encoded-word): https://datatracker.ietf.org/doc/html/rfc2047