Skip to content

Latest commit

 

History

History
1057 lines (823 loc) · 46.6 KB

HISTORICAL-DESIGN.org

File metadata and controls

1057 lines (823 loc) · 46.6 KB

This document is not maintained. It may be useful as a “how we got here” kind of thing. Descriptions of Crymap’s design may not reflect what was actually built, and musings on IMAP (particularly early on) may not reflect reality.

Crymap – A simple, secure IMAP server with encrypted data at rest

Motivation

uw-imap (University of Washington IMAP) is no longer maintained, and UW has even removed it from their site.

Other IMAP implementations are much more complex to set up and are still prone to security issues.

Nothing really supports encryption at rest.

Goals

No unsafe code other than OpenSSL itself.

Minimal configuration: Logging, location of user database, port to listen on. Per user, password hash, encryption information, location of user data and optional UID to switch to.

IMAPS only. No support for cleartext or STARTTLS.

User data is stored in a way that makes it amenable to online backup (as with rsync or ensync); i.e., don’t shove everything into a huge sqlite database or something.

Encryption at rest. All user data is encrypted and can only be accessed by knowing the user’s password (hence we can’t use PAM authentication, unfortunately, since we’d lose the encryption key when the password changed). Encryption is done in a public-private key system so that mail can be delivered without needing the decryption key. Encryption at rest protects from compromising of disk contents but not shell access since shell access could probe process memory and recover the key that way, so this does not fully replace using pgpipe — we’ll just layer both together. This also precludes using nginx to do TLS termination since it wants to be able to handle auth itself.

Make Thunderbird happy. Other mail frontends are nice-to-have but no point in using a time to test them.

Non-Goals

Performance. It’s nice to have, but simplicitly is more important for security.

Resistance to DoS from logged in users.

Supporting message mutations like deletion of attachments.

User data format

Encryption at rest makes it impossible to be interoperable with any other mailbox format (mbox, maildir, …), so we’re going to be doing our own thing regardless.

We need to support these attributes for IMAP:

  • UID. 32-bit integer uniquely identifying the message, strictly incrementing. We need to be able to remember UIDs that were used in the past.
  • Sequence number. Volatile, per-session identifier. The ire of IMAP implementors everywhere. Nothing to store here since it’s volatile.
  • System flags: \Seen, \Answered, \Flagged, \Deleted, \Draft, \Recent. These need to be queriable reasonably efficiently and must be mutable on a per-message basis.
  • Keyword flags (represented by \* in PERMANENTFLAGS). Need to be easily readable but not necessarily efficiently searchable. Need to be able to be defined permanently by the client. Also need to be mutable on a per-message basis.
  • Internal date. Just a timestamp we can embed somewhere.
  • Message size. The physical size of the RFC-2822 message. We need this for framing regardless.
  • Envelope structure, body structure, message text. We can just parse stuff out of the RFC-2822 message on-demand.

Each mailbox will be a directory. Names need not be encrypted since delivery needs to be able to determine mailbox names without the decryption key anyway.

Each directory contains 256 subdirectories, 00 through ff. Each message is a single file in one of those, where the directory name represents the low 8 bits of its UID, and the file name is the upper 24 bits of the UID in byte-wise little-endian order. Expunged messages are represented by an empty file.

A message consists of the following:

  • An encryption block, giving any information needed to decrypt the message (other than the key of course).
  • Immutable metadata, encrypted. Probably just the timestamp and size of the message (both compressed and plaintext) but there might be other stuff to put here.
  • The message data. Encrypted after compression, RFC-2822 format.

Mutable metadata

There’s a number of ways this could be done.

Important criteria:

  • Determining which flags are set on a message should require at most linear system calls with respect to the number of flags that could be set.
  • Determining which messages are have or do not have a particular flag must require sub-linear system calls with respect to the number of messages that will be found.
  • The system needs to be amenable to naïve backup files. Tearing in directory contents or within incrementally updated files must be safe.
  • Holding everything in memory is not ideal, and if this is done, it must have a reasonably compact representation.

There’s also some mutable metadata for the mailbox itself which must be tracked, though this is not performance sensitive and does not need to solve the same race conditions:

  • Whether the mailbox has one of the special mailbox flags (i.e., \Sent and friends).
  • Whether the mailbox is subscribed.

Disposition: Item stream CRDT is complex and inefficient. The security benifits it provides are tangible but small, since the more widely-used message classification system, mailboxes, is already fully transparent. We’ll go with unencrypted bitmaps.

Item stream CRDT

Mutable attributes like flags are tracked separately. Each mailbox has a “meta” subdirectory. The meta subdirectory contains any number of mutation files representing changes to mutable attributes.

Each mutation file is symmetrically encrypted (since there is no reason for unauthenticated processes to modify it). It starts with the timestamp of its creation, and then a sequence of encrypted blocks (framed so the file can be progressively appended to). Each block gives a timestamp and a mutation. By reading all mutations into memory and sorting by total order, every process can determine the mutable status of every message.

Mutation files have a maximum lifetime. After that point, the owner must stop writing the file and make a new one. After more time still, extra files can be garbage-collected by any process, simply by writing a new mutation file without redundant information and deleting the old ones. This system also encodes metadata about the mailbox itself.

Flag queries are performed by loading all the metadata into memory in a collapsed form and then looking at the in-memory representation.

A mailbox directory with no “meta” subdirectory is a \Noselect mailbox. Mailboxes with children are subsubdirectories under a “sub” subdirectory.

A mailbox directory with neither “meta” nor “sub” is considered to not exist. This is needed to support subscribing to deleted mailboxes.

Subscriptions are indicated by an empty “subscription” file inside the mailbox directory.

This is overall a fairly complex option, particularly since rollup must be done in multiple stages to avoid backup solutions seeing a torn view and losing the metadata.

Message flags as bitsets

Each flag is stored in a single file, which contains an encrypted bitmap keyed by UID. Determining whether a flag is on a message is simply a matter of seeking to the correct block, reading and decrypting it, and looking at the bit for that message. Finding the UIDs with a flag is simply a matter of scanning the file.

The security of this system is moderate at best. Even with 128 messages represented per block, an adversary can get a good idea of when certain messages get certain flags, and can trivially see which message are not even covered by the bitset based on its length.

Updates are trickier due to their read-modify-write nature. We’d need to use file locking to prevent processes from stepping on one another’s toes. Alternatively, if we use a stream cipher (of sorts) and one byte per message, writes would be atomic and would not be RMW. This does make the system even more transparent though, since each byte takes on one of two values.

A final alternative is to be completely transparent and not encrypt anything at all here. This isn’t entirely weird since mailbox names are also transparent. This turns this option into the simplest thing we could possibly do.

Interaction with naïve backup systems is easy though, since writing a single block will be atomic and torn updates are otherwise safe.

Alternate design for CONDSTORE and QRESYNC support

CONDSTORE and QRESYNC are a nice-to-have, but were rejected in the original design because they require assigning a unique, strictly ascending 63-bit value to every operation in the mailbox.

On further consideration, building the design around this may make things simpler in the long run. A key simplification is that we do not need to take any 63-bit sequence number and point to the change; we only need to know what messages have changed since that time, and to support conditional store.

Thoughts:

The message store can stay exactly as it is currently, except with `expunge-log` removed. The pre-built seqnum index is also removed. Expunge “events” are pushed up into the “change store”.

The “change store” is similar to the “message store”, but stores encrypted changesets identified by 31-bit change-ids (CID). Changes are things such as additions and removals of tags and expungement of messages. We have a rollup system which folds changes into single rollup files which also serve as a cache for the set of extant UIDs.

A Modseq has the greatest UID in the upper 32 bits and the greatest CID in the lower 32 bits. The modseq is thus effectively a 2-element vector clock, allowing messages to be inserted asynchronously from other changes.

The modseq at the instant a message is inserted is considered to have a CID of 0. However, we do not track UID-CID relations. The “current” modseq is always the greatest UID and greatest CID. Conditional stores are only strictly conditional on the CID, but this is fine since it is impossible for the client to observe the absolute ordering between their conditional change and the creation of the new message.

A message is “changed since” a modseq if the UID part is less than the UID of the message or if any changes after the CID of the modseq affected that message.

One quirk here is that each message is considered to have a “last changed modseq”, but we only track the CID. (We can’t associate a “canonical UID” with each change since that would be racy and could yield out-of-order events.) This leaves us between a rock and a hard place: we could synthesise the modseq by pasting the CID to the latest UID, but then a client that sees the new modseq and does a `CHANGEDSINCE` query will not get the message.

We can probably virtualise it: If the client FETCHes a modseq for a message, we store (in memory) the modseq that was returned, and act on that in future requests. This would still be observable to cross-session activity though.

Actually, we should be able to store the canonical UID with each modseq: if the writer of a change loses the race for a CID, it must first read in any new modifications and in doing so will learn of any later UIDs.

Rollups. Whenever an instance ingests a change whose id is an even multiple of 256, it dumps the current state, including the last UID it knows about and the extant UID list, into a file which is a sibling to the directory containing the last 256 changes. The rollup excludes flags for expunged messages and all but the last of any explicit expunge events (because CONDSTORE/QRESYNC does not require remembering them). Old rollup files and the original events can be garbage collected 24 hours after a rollup including them has been generated.

What this whole system makes easier implementation-wise is that there are now only two places to watch for realtime changes: the file corresponding to the next UID, and the file corresponding to the next CID. It also reduces the number of bespoke binary formats and gets us reasonably good encryption of message attributes. We do end up holding all information about flags in memory, but we probably would have ended up needing to do something like that anyway.

Overall data layout

The root of system data is a directory.

The system data directory contains a “users” subdirectory. This in turn has one symlink or subdirectory for each username that is allowed to log in. User directories do not need to be on the same filesystem as the system data directory. The system data directory also holds “crymap.toml” which defines the rest of the system configuration.

The user directory contains the following:

  • “mailboxes”, a subdirectory containing the user data (see previous section)
  • “config.toml”, a text file holding various user config, including password hash and encryption information and the names of special mailboxes.
  • “keys”, a directory containing encrypted private key blobs and cleartext public key blobs
  • “tmp”, a subdirectory used for temporary buffers and for staging files before moving them into place

IMAP Extensions

uw-imap advertises these: IMAP4rev1 IMAP4 AUTH=LOGIN AUTH=CRAM-MD5 IDLE CHILDREN UIDPLUS

ID [RFC2971], to exchange implementation info with the server

IDLE [RFC 2177]

UTF8=ONLY [RFC6855], so we don’t need to deal with the obsolete UTF-7 stuff. Looks like Thunderbird doesn’t support this, oh well. We should try to implement UTF8=ACCEPT at least though.

ENABLE [RFC5161], for clients to acknowledge UTF8=ACCEPT

MOVE, moving messages instead of COPY+delete+EXPUNGE https://tools.ietf.org/html/draft-krecicki-imap-move-01

UIDPLUS [RFC 4315], base more things around UIDs instead of sequence numbers

QRESYNC [RFC 5162]

BINARY [RFC 3516] sounds since since it’s another thing about ditching old limitations, but it allows the client to force the server to transcode MIME messages, so no. (On the other hand, IMAP4rev2 (draft) requires this…)

CHILDREN [RFC 3348] just adds mailbox attributes indicating whether or not each mailbox has children, so that’s easy enough.

COMPRESS [RFC 4978] might be nice.

WITHIN [RFC 5032] adds a couple search filters based on relative timestamps. Pretty trivial. On further consideration, this has a really nasty and poorly-defined interaction with the CONTEXT extensions, and it seems mainly useful in conjunction with FILTERS, which we’re definitely not doing, so let’s not do this.

LITERAL- [RFC 7888], avoid client-server round-trip for binary literals <= 4096 bytes with special syntax. Though apparently it’s not widely supported, and IMAP4rev2 (draft) requires LITERAL+, so I guess we’ll do that.

XLIST and related [RFC 6154], for the special mailbox markers. https://bugzilla.mozilla.org/show_bug.cgi?id=476260

UNSELECT [RFC 3691], to close a mailbox without expunging it

REPLACE [RFC 8508], a simple APPEND + UID EXPUNGE command.

APPENDLIMIT [RFC 7889], a completely passive way for the server to indicate the maximum allowable APPEND size.

SASL-IR [RFC 4959], allows the client to send its auth immediately instead of going through a continuation round-trip. Pretty trivial.

NAMESPACE [RFC 2342]. It’s trivial for what we’re doing.

SEARCHRES [RFC 5182] might be worth doing just because IMAP4rev2 will probably require it.

ESEARCH [RFC 4731] is also incorporated into IMAP4rev2.

LIST-EXTENDED [RFC 5258] is also incorporated into IMAP4rev2.

[RFC 5530] defines additional error codes.

[RFC 8457] defines an extra keyword flag (which we support for free) and special-use attribute (which is trivial to add).

IMAP4rev2 draft: https://datatracker.ietf.org/doc/draft-ietf-extra-imap4rev2/?include_text=1

Special mailboxes

Inbox. Undeletable, unrenamable, flagged \Inbox. Name must be case-insensitive. Renaming has a weird special case in that we must support it, but do so by creating a new mailbox with the new name and moving all contents inside Inbox into it, while leaving any child mailboxes of Inbox unaffected. There doesn’t seem to be a particularly strong reason to implement this.

Sent. Undeletable, flagged \Sent.

Junk. Undeletable, flagged \Junk.

Trash. Undeletable, flagged \Trash.

Operational design

One process per connection. This has a lot of advantages, such as being able to switch users once authenticated, isolating unrelated connections from eaech other, making it possible to see sources of load in `top`, and simplicity.

Processes can be spawned by inetd or similar, making daemonisation unnecessary.

If UNIX-specific features are enabled, the process can be configured to chroot into the system data directory after all resources needed are loaded but before anything on the socket is read, to insulate from OpenSSL vulnerabilities. After authentication, it can further optionally chroot into the user directory and switch to a particular UID. These features of course require spawning the process as root. Some installations may prefer to just run under a dedicated service user.

Encryption

Symmetric encryption is done with AES-128-GCM, except for mutable metadata files, which use AES-128-CBC since it is more suitable for incremental updates and less sensitive in general.

An encrypted blob starts with a 16-byte IV and is followed by the ciphertext with 0-padding. In GCM mode, this is followed by the 16-byte AEAD tag.

Asymmetric encryption is done through RSA. This adds a prefix to the encrypted blob:

  • The public/private key pair name, terminated with LF
  • Two bytes (LE u16) indicating the length of the next component
  • The 16-byte encryption key of what follows, itself encrypted with RSA using PKCS1-OAEP padding.

A user can have any number of keys, each of which has a name. Keys may be “internal” if the public key is not stored in cleartext, or “external” if it is.

Each private key is stored in the user’s “keys” directory at “$name.priv”. It is an encrypted blob containing the RSA key’s DER format, encrypted with the user’s master key. External public keys are stored in “$name.pub” in PEM PKCS#1 format.

The master key is a 16-byte value generated randomly generated when the user is created. At runtime, it is derived from the user’s password indirectly.

The user’s password is passed through Argon2, using parameters and salt stored in the user config to produce the password hash. The user config stores the SHA-3 of the password hash with “check” appended in order to verify the password. The master key is derived by taking the SHA-3 hash of the password hash with “master” appended and then XORing it with a “master key XOR” stored in the user config. The first 16 bytes of the result are the master key. The master key XOR makes it possible to change the password without invalidating the data.

Internal Architecture

`crypt`. Shared crytographic primitives.

`mailbox`. Implementation of the basic mailbox storage mechanism, including mutable metadata and sequence numbers, but not including functionality involving parsing of MIME messages (e.g. partial FETCH, SEARCH elements), instead only providing a feature to stream cleartext messages out of the store.

`mime`. Minimal parsing of MIME messages. We might pull this out of pgpipe.

`imap`. Implementation of IMAP operations, but nothing to do with the wire protocol. Ties `mailbox` and `mime` together.

`wire`. Definitions for working with the IMAP wire protocol, with full support for server-side parsing/serialisation and enough support for client-side to support the CLI.

`server`. Actual IMAP server implementation. Ties `imap` and `wire` together.

`cli`. Various CLI utilities.

On parsing MIME

IMAP unfortunately requires the server to actually parse MIME content in certain cases, particularly for structured `FETCH` and many `SEARCH` queries.

How little can we get away with? A full MIME parser is complex and error-prone.

Inventory: What can the client ask the server to extract?

  • FETCH: BODYSTRUCTURE. Also known as BODY, which confusingly is different from BODY[section]. This is a tree representing the multipart structure (including recursing into `message/rfc822` attachments for some reason). The exact format of the body structure varies wildly with content type. It’s mostly stuff from the simple message headers, but also the size of certain types (measured in bytes or lines depending on type), and in one case the MD5 of the content (though that is extension data, which is apparently optional). So while formatting this the way IMAP wants is going to be ridiculously complex, getting the data itself should be fairly easy.
  • FETCH: BODY[]. Fetches a (possibly filtered) part of the message. The client can optionally select a multipart section, and then can choose between all headers, some specific headers, all but some specific headers, the content only, or the whole section. Additionally, the client can request a byte-slice, which can just be handled after the whole byte array is buffered.
  • FETCH: ENVELOPE. Grab a smorgasboard of headers and, for reasons unknown, reencode them into IMAP’s own syntax. This includes fields that can contain non-ASCII characters in 8BITMIME, which is potentially problematic. These also require a deeper understanding of MIME and.. SMTP? syntax. There’s also `Subject` which may require decoding.
  • SEARCH: BCC, CC, FROM, TO. Primary headers that only need rudimentary parsing for the search match.
  • SEARCH: SENTBEFORE, SENTON, SENTSINCE. The `Date` header. There’s some ambiguity regarding timezone but is otherwise unproblematic.
  • SEARCH: SUBJECT. We need to decode the subject if it’s in one of the wonky non-ASCII encoding schemes.
  • SEARCH: TEXT. Search the entire content, headers and all. RFC 3501 does not indicate whether the server is expected to know how to decode the body. We’ll assume no. Server-side text search will be slow and is undeniably the wrong approach here so it’s not that important to fully support.

Overall, BODYSTRUCTURE and ENVELOPE are both apalling and will be, by a wide margin, the biggest difficulty here.

Notifications and IDLE

In general, and particularly when using IDLE, we need some way for the process to learn about changes made by other processes.

Goals:

  • Crymap itself shouldn’t have different paths for different OSes.
  • Support FreeBSD well. Linux is also a must.
  • Minimise the time between message delivery and notification to an idling client.
  • No “exotic” libraries, e.g., ZeroMQ, or anything that depends on another process (e.g. DBus).

Polling

There’s polling, of course, and that’s really the only option if we happen to be on NFS or similar. Polling is non-ideal not only because of its inherent wastefulness, but also delays notifications since we need to keep the poll rate to something reasonable.

`notify` crate

The original plan was to use the `notify` crate. It’s fairly usable despite its oddities.

However, it doesn’t support `kqueue`, relegating FreeBSD — the main target platform — to polling. The only reasonable OS it supports is Linux through inotify. It also brings in a huge number of dependencies, some for OSes we don’t care about and some because it uses Tokio and friends despite not exposing a Tokio-friendly interface.

UNIX sockets

The scheme would look something like this:

  • Each mailbox has a `socks` directory.
  • When a process goes into idle, it creates a datagram listener in that directory, then makes one last poll to prevent race conditions, then waits for a message on the socket. When it wakes up, it discards any message received and unlinks the socket and uses the normal poll process to see what’s changed.
  • When a process makes a change worth notifying, it iterates through the sockets in the `socks` directory and sends one packet to each (with `NOWAIT`) before unlinking it.

The sockets are “one shot”; the expectation is that an idling process will wake up in response to any message and so there is no reason to allow it to receive more than one. Unlinking of sockets by the notifier and the idler ensure that socket files do not accumulate over time (e.g. from processes that got killed while idling).

This system should work on pretty much everything but Windows, which isn’t supported for other reasons anyway.

On the \Recent thing

\Recent is a very weird thing. Designs before <2020-06-14 So> treated it as a mostly regular flag (albeit one stored inverted so that it is set by default), with the semantics that a message has \Recent for all sessions until some session fetches the message, at which point \Recent is cleared for all sessions.

This is not what RFC 3501 wants. The standard’s semantics are that \Recent is set for the first session to “see” the message, and remains set forever in that session, while no other session sets \Recent on the same message.

The RFC also has this paragraph:

> If it is not possible to determine whether or not this > session is the first session to be notified about a message, > then that message SHOULD be considered recent.

This would appear to make it compliant to just say every message is recent, all the time.

So while the standard allows wildly varying behaviours, the current design is objectively incorrect.

Resolution: Strictly correct behaviour via “recency tokens”. It’s not that complicated and doesn’t come with a huge slew of drawbacks, so we’ll go with the spirit of the standard.

Nothing is \Recent

This is what GMail does.

Because clients do not have visibility into other sessions, there is no way for a client to tell the difference between a server which does not implement \Recent and one in which another session is constantly “seeing” everything first.

This is naturally the simplest and most effective solution.

Everything is \Recent

This is a variant of making nothing \Recent, but complies with the letter of the standard, though certainly not the spirit.

This seems more likely to invoke weird client behaviours though.

Strictly correct behaviour

Each instance maintains an in-memory, transient set of UIDs which are marked \Recent. Whenever it learns of a new UID, it tries to “claim” it by some means, and if successful, adds that UID to the set.

The biggest disadvantage of this is that it makes the system even more stateful. In the filesystem-as-a-database model, it also means we have more stuff that may need to be cleaned up. And we really don’t want to have a parallel hier-id-scheme with a marker for every message.

Maybe a “recency token”? To claim a UID as recent:

  • Scan the mail directory for a file whose name matchek “recent.%d”
  • If none found, atomically create one named “recent.$MAXUID” and the recency claim fails. (This is so we don’t recover by marking every UID recent.)
  • If found but it has a greater UID, recency claim fails.
  • If found, rename the one with the greatest UID (without allowing replacement) to “recent.$UID”. If this fails due to ENOENT, retry the whole process. If it fails due to EEXISTS, recency claim fails.
  • Unlink any extraneous tokens found.

Multiple UIDs can be claimed at once by renaming across a range.

Ordinarily, there should be exactly one token; the steps to create a new one or remove an old one are a recovery process in case the token is lost somehow. The create step is not atomic since multiple processes could concurrently claim different UIDs in different orders.

On “soft expunge” and fetching of expunged messages

The original plan was to strictly have hard expunges, where an expunge immediately destroys the underlying message. This is one of the most contentious points in IMAP. From the code at that time (aronud <2020-07-01 Mi>):

/// What the tagged response from a `FETCH` should be.
///
/// This is a particularly nasty aspect of IMAP4, owing to its original
/// development being targeted at systems that don't allow concurrent mailbox
/// access.
///
/// The issue is this: The IMAP data model is that the current sequence numbers
/// exactly represent the set of messages that exist. However, since sequence
/// numbers are volatile, we cannot send updates about that set to the client
/// in realtime, since that would break commands that use sequence numbers for
/// addressing (and fetch responses, which are weirdly addressed by sequence
/// number even for `UID FETCH`). What, then, do we do if the client attempts
/// to FETCH a message which is still addressable in its snapshot, but has been
/// expunged by another session?
///
/// RFC 2180 provides some suggestions:
///
/// > 4.1.1 The server MAY allow the EXPUNGE of a multi-accessed mailbox but
/// > keep the messages available to satisfy subsequent FETCH commands until it
/// > is able to send an EXPUNGE response to each client.
/// >
/// >
/// > 4.1.2 The server MAY allow the EXPUNGE of a multi-accessed mailbox, and
/// > on subsequent FETCH commands return FETCH responses only for non-expunged
/// > messages and a tagged NO.
/// >
/// > 4.1.3 The server MAY allow the EXPUNGE of a multi-accessed mailbox, and
/// > on subsequent FETCH commands return the usual FETCH responses for
/// > non-expunged messages, "NIL FETCH Responses" for expunged messages, and a
/// > tagged OK response.
/// >
/// > 4.1.4 To avoid the situation altogether, the server MAY fail the EXPUNGE
/// > of a multi-accessed mailbox.
///
/// The author of the Dovecot server also has
/// [suggestions](https://imapwiki.org/MultiAccessPractices):
///
/// 1. 4.1.1
/// 2. 4.1.3
/// 3. 4.1.2
/// 4. Kill the connection when this situation arises
/// 5. 4.1.4
///
/// Apparently nobody thinks of "silently return OK without the expunged
/// messages missing" as a viable option, even though that's how a number of
/// other commands (e.g. `STORE`) behave in some implementations.
///
/// Crispin believed that 4.1.2 was the worst possible option:
///
/// > I think that 4.1.2 is a bug, and servers that do it are broken.
///
/// He preferred even 4.1.4 over it, even though 4.1.2 is how essentially any
/// other protocol would handle an access to a deleted item. RFC 3501 also
/// explicitly permits this behaviour:
///
/// > Result: ... NO - fetch error: can't fetch that data
///
/// 4.1.2 sounds like what Courier (a maildir-based implementation) would have
/// implemented, and the intensely bad relations between Crispin and Courier's
/// author would have had an influence on that sentiment.
///
/// Crispin was a proponent of 4.1.1, but that is unnecessarily complex to
/// implement for such a fringe case; we'd need some mechanism for all sessions
/// to discover an impending expunge in real time, and have some kind of grace
/// period before the message was actually expunged.
///
/// 4.1.4 is utterly unacceptable. Making it impossible to delete things is
/// *not OK*. Besides, we have no way to know if there are other sessions.
///
/// 4.1.3 is doable, but it's not great. Apparently Cyrus does this. It forces
/// clients to guess what happened.
///
/// The main concern with 4.1.2 is apparently insane clients that think it
/// somehow makes sense to immediately retry a request that resulted in `NO`
/// immediately. Arnt Gulbrandsen in the IMAP mailing list wrote about a hybrid
/// approach on 2006-09-15:
///
/// > (Btw, I changed our code today to use 4.1.2+loopbreaker. The first time
/// > a client fetches a message which another client has expunged, but whose
/// > expunge has not yet been reported in this session, the server issues a
/// > NO as in 4.1.2, and makes a note of the UID(s). If any of those UIDs
/// > are fetched again before the server can report the expunge, the server
/// > issues a BYE. When it reports expunges, it clears its UID set. I think
/// > that's as good as 4.1.1+period.)
///
/// ("4.1.1+period" refers to a scheme of using 4.1.1 by way of a 5-minute or
/// more grace period, followed by Dovecot 4 for attempts that happen later.)
///
/// On 2006-12-31, Gulbrandsen provided an update that this 4.1.2+loopbreaker
/// worked well with clients.
///
/// 4.1.2+loopbreaker is what we implement here.
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub enum FetchResponseKind {
    /// Return OK.
    ///
    /// This happens in one of two situations:
    ///
    /// - The FETCH request did not reference any expunged messages.
    ///
    /// - The request indicated that the client is prepared to deal with
    /// missing messages. This means that either `collect_vanished` was
    /// specified, or `changed_since` was given and had a UID greater than any
    /// expunged UID encountered.
    Ok,
    /// Return NO.
    ///
    /// This is returned for any case where `Ok` is not returned, unless the
    /// client attempts to fetch the same expunged UID more than once before
    /// the next poll.
    No,
    /// Return the results, then kill the connection.
    ///
    /// This is returned if `No` would be returned, but the client already made
    /// another FETCH request since the last poll and got a `No` for one or
    /// more of the same UIDs.
    Bye,
}

It turns out there is a good reason to want some kind of “soft expunge” though: proper interaction between naïve backup tools and move operations (be it by the `MOVE` extension or the traditional COPY+STORE+EXPUNGE sequence). With hard expunge, there’s the possibility that a backup tool will see the destination without the moved message, then see the source after the expunge, and create a backup missing the message.

We already have a separation between expunging a message (a change transaction) and destroying the underlying file (currently processed on visibility of the transaction). We could attack a “deadline” to expunge events that gives a grace period after which the actual file is destroyed. The obvious implementation — just holding a queue of (deadline,uid) pairs in `MailboxState` and processing them on poll cycles — is simple, but has a nasty corner case: mailboxes that the client doesn’t usually hold open, such as \Trash. The simple system would give the effect that an “empty trash” operation wouldn’t actually free any disk space until the next time the user decided to look into that mailbox.

Plan: Each mailbox has a `condemned` directory. When a server instance sees an expunged message (which now has a `deadline: Date<Utc>`), if the deadline is in the future, it appends a 32-bit LE integer giving the UID to a file named `YYYYMMDD` in `condemned`. When the user logs in, all mailboxes are scanned for non-future condemnation files and processed and the condemned files destroyed.

A nice side-effect of this is that we mostly get the Crispinesque snapshot isolation. The exception is that a client which takes no action that would let it receive `EXPUNGE` events for 24+ hours would run into the fetch case above, but even Crispin acknowledged that such a client was pathological, and suggested 30 minutes as an expunge grace period.

This does not fully solve the naïve backup interaction, since operations like renaming a mailbox can still suffer the same fate. However, that’s an inherent problem with such backup tools, and is a much less common operation than moving a handful of messages, e.g., from inbox to archive or trash.

<2020-07-18 Sa 18:00> On further thought, it’s probably best to keep the soft expunge data in the `MailboxState` just deal with the (small) cost of doing a full select.

On atomicity of bulk operations

RFC 3501 and its extensions define a number of mutating bulk (multi-message) operations:

  • EXPUNGE, explicitly documented to be non-atomic
  • STORE, explicitly documented to be non-atomic
  • APPEND (MULTIAPPEND extension)
  • COPY
  • MOVE (MOVE extension), explicitly documented to be non-atomic

(Note though that Crispin claimed that EXPUNGE and STORE were also atomic, it was just up to the server whether the command was executed fully — a strange definition, and not the one we use here.)

I originally believed that COPY could only apply to one message, which would leave APPEND as the only atomic bulk operation. This lead to the decision to not design support for atomic bulk operations and simply not implement MULTIAPPEND.

However, COPY does support multiple messages, and requires atomicity. (Though copying the flags is only a SHOULD, so we don’t need that to be 100% atomic to comply with the standard.)

Disregard the requirements and settle for the 99.99% solution

This was the original plan once the nature of COPY was discovered. There’s a good argument to be made that the atomicity requirements are superfluous as long as the operations are effectively atomic in all circumstances that do not involve major error conditions.

From a section of code that was never checked in:

/// ## A general note on atomicity
///
/// Mark Crispin was quite adamant about the bulk operations `APPEND` and
/// `COPY` being atomic. This implementation is not atomic --- it processes
/// each message one at a time (though all _preparations_ for appending the
/// messages, i.e., getting the files written to disk, are completed before
/// the mailbox mutations actually start).
///
/// The implementation here is based on practicality and an assertion that
/// strict atomicity is not useful and 99.99% "happy-path" atomicity is
/// sufficient.
///
/// ### What is atomicity?
///
/// There are two facets to the usual definition of "atomic":
///
/// 1. The operation either succeeds completely or has no effect.
///
/// 2. It is not possible to observe an intermediate state while it is
/// occurring. (Point (1) covers the cases where it is _not_ in progress.)
///
/// Point (1) is uncontroversial.
///
/// Point (2) bears discussion. A client cannot observe an intermediate
/// state through queries on one connection because Crymap does not allow
/// concurrent operations on the same connection. Thus, to see the
/// intermediate state, a client would need to watch events unfold on a
/// separate connection. However, does that matter?
///
/// RFC 3501 has *extremely* weak consistency requirements for concurrent
/// operation, and is generally written with a disregard for that
/// possibility. So, to answer the question of whether (2) is relevant, we
/// need to ask two things:
///
/// - Does RFC 3501 permit a server to announce messages one at a time,
///   i.e. `* EXISTS 1 / * EXISTS 2 / * EXISTS 3 / ...`? The answer appears
///   to be yes; there isn't really anything in the RFC that would make
///   this negative.
///
/// - Does this align with "common sense"? I.e., will it cause undesirable
///   behaviours in clients? A client would have to try *very hard* to
///   actually observe these effects: open a second connection, select the
///   same mailbox, and constantly poll it to observe the UIDs coming in
///   one at a time. To any client not performing the bulk operation, the
///   effect is indistinguishable from another client simply a bunch of
///   single-message commands.
///
/// ### Why is atomicity desirable?
///
/// In June 2010, there was a lengthy flamewar on the IMAP mailing list
/// surrounding the proposed `MOVE` extension. A large part of the
/// discussion revolved around atomicity (note that the `MOVE` extension
/// was finalised in a form that does *not* require mailbox-level
/// atomicity, though at this point in time it was "atomic" like `COPY` and
/// `MULTIAPPEND`).
///
/// Mark Crispin had the most to say about atomicity. The main point seems
/// to be that, if a bulk operation fails partially, the client will have a
/// harder time recovering. This is a reasonable point.
///
/// Here, we argue that _atomicity is irrelevant if there is no possible
/// way for anybody to recover_, and the main focus should be to ensure
/// that no mail is lost.
///
/// ### Happy-path atomicity
///
/// The bulk operations in this implementation are atomic unless something
/// _extremely bad and unrecoverable_ happens:
///
/// 1. The mailbox is deleted out from under the server. (Atomicity is
/// somewhat moot here though since one could say that the operation
/// completed the instant before the mailbox was deleted.)
///
/// 2. The mailbox is renamed out from under the server. When this happens,
/// everyone loses context and has to figure things out from scratch
/// anyway.
///
/// 3. We run out of numbers --- UIDs, modseqs, or inodes. If this happens,
/// the server wasn't going to keep working much longer anyway.
///
/// 4. The filesystem fails; i.e., bad sector, NFS connectivity lost.
/// Again, the server wasn't much longer for this life anyway.
///
/// 5. The server shuts down unexpectedly. This means a bug in the code
/// (and "violated the standard due to a bug" is not a controversial thing)
/// crashed the server, someone killed the server process (and could just
/// tamper with the mail store directly), or the whole machine shut down
/// unexpectedly (and the client will need to wait until it comes back to
/// life and then pick up the pieces anyway).
///
/// None of these situations have any remedy the client can take on an
/// automatic basis as they make the mailbox totally unavailable. The only
/// reasonable follow-up action is to wait till the server is available
/// again, then present the current state to the user and let them decide
/// what to do.
///
/// To that end, if one of the bulk operations calls fails, we close the
/// whole connection.

Implement atomic bulk operations

We get atomic bulk EXPUNGE and STORE for free through the change transaction system.

Since MOVE is non-atomic, we don’t need atomic cross-mailbox operations.

That leaves bulk message insertion as the one operation that’s not already atomic but needs to be.

There’s a lot of possibilities that don’t work here:

  • Inserting messages in descending order, i.e. from (UID+N) to (UID). Another racing process could fill in the hole meanwhile.
  • Anything involving change transactions. Change transactions do not have a well-defined order with respect to message insertions, and they aren’t visible to stateless processing (i.e. mail delivery or copying to a different mailbox).
  • Inserting all the new messages into a directory, then creating broken symlinks from the new UIDs into tha directory’s final path, then moving the directory into the final location. There’s an inherent race in that another process cannot distinguish whether a message after the last broken symlink interrupted the transaction (causing it to have no effect) or simply followed the transaction and appeared between the last UID being allocated and the transaction being committed.
  • A mechanism like we use for \Recent for allocating UIDs. We have no way to tell whether a gap in the UIDs is due to an aborted transaction or an in-flight transaction.

HIS multiple allocation

The hierarchical ID scheme treats single IDs as the main unit of atomic allocation, but it is also possible to allocate whole chunks of the ID space atomically. The procedure would look like this:

  • Round the number of IDs to be allocated up to the next natural allocation size: 1=>1, 2..=256=>256, 257..=65536=65536, 65536..=16777216=16777216, anything more is unsupported. This gives the allocation granularity.
  • Arrange the IDs to be allocated in a staging tree, laid out as if the IDs started at 0 and omitting the redundant upper levels. (E.g. if the allocation granularity is 256, this is just one directory; if it is 65536, it is a directory of directories.)
  • Insert expunged IDs until the next ID is a multiple of the allocation granularity. This must be done before the next step so as not to produce holes in the ID space. When the granularity is coarser than 256, whole expunged branches can be created.
  • Make one attempt to move the directory to the location that would represent the next ID space. If it fails, drop the whole staging directory. We don’t want to retry this since each attempt wastes huge swaths of the UID space.
  • Link the directory number to the directory (i.e., `XX` -> `XX.d`).

This fits quite naturally with the existing system and is easily testable, if slightly awkward to implement.

The main downside is the waste of the limited UID space. On average, half of the allocation granularity is wasted. In the pathological case, this can produce a mailbox in which only 1/128 of UIDs are allocated (by repeatedly doing a bulk operation with 2 messages), which would limit such a mailbox to around 16 million messages.

In general, it’s probably reasonable to expect that most bulk operations will entail more than two messages. It’s also reasonable to expect that most small bulk operations will be a direct result of user interaction, and 8 million discrete user interactions is quiet a lot. A user doing one bulk operation every 10 seconds, 24/7, would require two and a half years to exhaust the UID space. A somewhat more reasonable user doing such an operation once per minute during business hours would require 64 years for the same task.

So basically, only a very unreasonable automated process would cause a problem here.

Issues with flags and unicode

https://bugzilla.mozilla.org/show_bug.cgi?id=650623 http://dovecot.2317879.n4.nabble.com/Gmail-like-labels-three-years-later-td41128.html

It looks like Thunderbird uses MUTF7 for unrepresentable flags, which is an odd choice given that they are normally expected to be case-insensitive, but it may make sense to move over to that from the current encoded word hack.

There’s not really anything we need to do specifically; UTF8=ACCEPT and IMAP4rev2 don’t offer any in-band improvements here, so we might as well let clients sort things out between themselves.

Mail delivery

Originally, the intent was to simply have Crymap be something to be invoked from `.forward` and similar. Unfortunately, it turns out that emails passed through that sort of pipe have had their line endings tranlsated to UNIX line endings, so we’d need to translate back, which isn’t the end of the world, but it means this stack cannot support binary payloads. It’s still worth supporting but we should have something that’s actually binary-safe.

LMTP seems fairly reasonable. It’s more work to implement, but since it absolves us of needing to do any kind of mail queueing (as we would with SMTP), it shouldn’t be too big a deal.

LMTP in more detail

Relevant RFCs:

RFC 5321, latest SMTP base standard (obsoleting the original RFC 821 and its successor RFC 2821), incorporates extension mechanisms from RFC 1869.

RFC 2033, LMTP

RFC 1854, PIPELINING extension

RFC 1652, 8BITMIME extension

RFC 2034, ENHANCEDSTATUSCODES extension. The actual codes are in RFC 1893.

RFC 1830, BINARY extension

RFC 3207, STARTTLS for SMTP

RFC 6531, allow UTF-8 everywhere

RFC 1870, SIZE extension (allow client to indicate message size so it can be rejected quickly if too large)

The PIPELINING, 8BITMIME, and SMTPUTF8 extensions are trivial; they require doing literally nothing (at least for the subset of SMTP that makes sense for LMTP). ENHANCEDSTATUSCODES simply adds an extra, parallel set of status codes to SMTP so that every error has two status codes, but that’s just something to build in as we go along.

BINARY adds a simple means of transferring binary data. It’s not required, but it’s simple enough and supporting binary is a good thing.