Email Internal Model

#resources #resources/programming #resources/programming/email #resources/programming/architecture

Email Internal Model

The provider-agnostic types that sit between the wire protocols (IMAP, Gmail API, SMTP) and the application logic. Designing this well is most of the engineering work in an email client.

Why a separate internal model

Three problems force one:

Provider semantics differ. Gmail has labels (a message can have N); IMAP has folders (a message has 1). Microsoft Graph has both, plus categories. JMAP has its own model.
Operations need to be provider-independent. "Archive this" means different things to different backends. The UI can't have N code paths per operation.
The store is canonical. SQLite holds the truth; providers are sources to sync from. The store schema needs a stable shape that doesn't change when you add a new provider.

The solution: define internal types that capture the semantic intent, plus per-provider adapters that translate.

The core types in mxr

Per Mxr's crates/core/src/types.rs:

Envelope (~120 lines)

pub struct Envelope {
    pub id: MessageId,                    // mxr-internal UUIDv7
    pub provider_id: String,              // mailbox-scoped for IMAP, stable for Gmail
    pub thread_id: ThreadId,              // mxr-internal or provider-supplied
    pub message_id_header: Option<String>,
    pub in_reply_to: Option<String>,
    pub references: Vec<String>,
    pub from: Vec<Address>,
    pub to: Vec<Address>,
    pub cc: Vec<Address>,
    pub bcc: Vec<Address>,
    pub subject: String,
    pub date: DateTime<Utc>,
    pub snippet: String,                  // Gmail-style preview, computed for IMAP
    pub flags: MessageFlags,              // bitfield: read, starred, draft, etc.
    pub has_attachments: bool,
    pub size: usize,
    pub unsubscribe: UnsubscribeMethod,
    // Transient (sync-time only):
    pub label_provider_ids: Vec<String>,  // resolved to LabelIds during sync
}

The Envelope is headers + metadata. Bodies live separately to keep search/list operations cheap.

MessageBody

pub struct MessageBody {
    pub message_id: MessageId,
    pub text_plain: Option<String>,
    pub text_html: Option<String>,
    pub attachments: Vec<AttachmentMeta>,
    pub fetched_at: DateTime<Utc>,
    pub metadata: MessageMetadata,        // list-id, auth-results, calendar invite, etc.
}

text_plain may be missing (HTML-only emails); text_html may be missing (plaintext-only). When both missing, mxr synthesises a best-effort summary from attachments and metadata.

Label

pub struct Label {
    pub id: LabelId,
    pub name: String,
    pub kind: LabelKind,                  // System | Folder | User
    pub provider_id: String,              // "INBOX", "Label_abc123", etc.
    pub unread_count: u32,
    pub total_count: u32,
    pub color: Option<Color>,             // Gmail only
}

pub enum LabelKind {
    System,    // INBOX, Sent, Drafts, Trash, Spam, Archive — provider-defined
    Folder,    // IMAP folder that's not a special-use system folder
    User,      // user-created label/folder
}

LabelKind is the seam where Gmail-vs-IMAP impedance lives. From the UI's perspective, "the inbox" is LabelKind::System with a known role; the underlying provider ID varies.

Thread

pub struct Thread {
    pub id: ThreadId,
    pub subject: String,
    pub participants: Vec<Address>,
    pub message_count: u32,
    pub unread_count: u32,
    pub latest_date: DateTime<Utc>,
    pub snippet: String,
}

Computed from envelopes belonging to the thread. For Gmail, ThreadId comes from Gmail's threadId. For IMAP, it's reconstructed via JWZ threading.

Address

pub struct Address {
    pub name: Option<String>,             // display name, may be encoded-word decoded
    pub email: String,                    // canonical lowercase
}

mxr canonicalises email addresses to lowercase for matching but preserves the original form for display.

MessageFlags

bitflags! {
    pub struct MessageFlags: u32 {
        const READ      = 1 << 0;
        const STARRED   = 1 << 1;
        const DRAFT     = 1 << 2;
        const SENT      = 1 << 3;
        const ANSWERED  = 1 << 4;
        const FLAGGED   = 1 << 5;
        const TRASHED   = 1 << 6;
        const SPAM      = 1 << 7;
    }
}

Bitfield because messages typically have a few flags; separate columns would balloon the schema.

The provider trait

The seam between protocol and internal model:

#[async_trait]
pub trait MailSyncProvider: Send + Sync {
    async fn sync_messages(&self, cursor: &SyncCursor) -> Result<SyncBatch>;
    async fn fetch_body(&self, msg_id: &MessageId) -> Result<MessageBody>;
    async fn list_labels(&self) -> Result<Vec<Label>>;
    async fn modify_labels(&self, msg_ids: &[MessageId], add: &[LabelId], remove: &[LabelId]) -> Result<()>;
}

#[async_trait]
pub trait MailSendProvider: Send + Sync {
    async fn send(&self, msg: OutgoingMessage) -> Result<SendResult>;
    async fn save_draft(&self, draft: Draft) -> Result<DraftId>;
}

Adapters: provider-gmail (uses Gmail API), provider-imap (uses IMAP), provider-smtp (send only), provider-fake (in-process for tests).

The split between sync and send is real — you can read via Gmail API and send via SMTP. mxr supports this combination.

SyncCursor — opaque per-provider state

pub enum SyncCursor {
    Initial,
    Gmail { history_id: String },
    GmailBackfill { history_id: String, page_token: String },
    Imap {
        per_mailbox: HashMap<String, ImapMailboxCursor>,
        capabilities: ImapCapabilityState,
    },
}

Each provider stores whatever state it needs to resume sync. The store persists this opaquely; the provider parses it on next sync.

Why this matters for Lazydap

The same pattern applies. lazydap's DAP Adapter is an adapter; lazydap's core types are provider-agnostic Frame, Scope, Variable, Breakpoint; lazydap's daemon works against the trait, not against codelldb specifics.

When you've solved this problem once well (mxr), applying it to a new domain is cheaper. The pattern is "find the seam, define the trait, push provider quirks below the line."

Common pitfalls

Internal IDs vs provider IDs — never expose provider IDs in the IPC contract or store schema; if you change providers, the IDs change. mxr uses UUIDv7 for MessageId/ThreadId/LabelId; provider IDs are looked-up via secondary indexes.
Lossy abstraction — over-abstracting hides things users care about. Gmail labels-as-tags vs IMAP folders-as-containers IS a behavioural difference; mxr exposes it via LabelKind. Don't pretend they're the same.
Schema migrations — once the internal model is in production, changing it requires migrations. Get the shape right early. mxr uses sqlx compile-time-checked SQL, which catches schema drift early.

Email Internal Model

Why a separate internal model

The core types in mxr

Envelope (~120 lines)

MessageBody

Label

Thread

Address

MessageFlags

The provider trait

SyncCursor — opaque per-provider state

Why this matters for Lazydap

Common pitfalls

See also