From 687b650581eb46bc862b4d282768596347196fc0 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 30 Dec 2018 00:20:00 +0000 Subject: [PATCH 01/36] first cut of MSC1763 for configurable event retention --- .../1763-configurable-retention-periods.md | 243 ++++++++++++++++++ 1 file changed, 243 insertions(+) create mode 100644 proposals/1763-configurable-retention-periods.md diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md new file mode 100644 index 00000000000..89cadfa41ae --- /dev/null +++ b/proposals/1763-configurable-retention-periods.md @@ -0,0 +1,243 @@ +# Proposal for specifying configurable retention periods for messages. + +A major shortcoming of Matrix has been the inability to specify how long +events should stored by the servers and clients which participate in a given +room. + +This proposal aims to specify a simple yet flexible set of rules which allow +users, room admins and server admins to determine how long data should be +stored for a room, from the perspective of respecting the privacy requirements +of that room (which may range from "burn after reading" ephemeral messages, +through to FOIA-style public record keeping requirements). + +As well as enforcing privacy requirements, these rules provide a way for server +administrators to better manage disk space (e.g. to enforce rules such as "don't +store remote events for public rooms for more than a month"). + +## Problem: + +Matrix is inherently a protocol for storing and synchronising conversation +history, and various parties may wish to control how long that history is stored +for. + + * Users may wish to specify a maximum age for their messages for privacy + purposes, for instance: + * to avoid their messages (or message metadata) being profiled by + unscrupulous or compromised homeservers + * to avoid their messages in public rooms staying indefinitely on the public + record + * because of legal/corporate requirements to store message history for a + limited period of time + * because of legal/corporate requirements to store messages forever + (e.g. FOIA) + * to provide "ephemeral messaging" semantics where messages are best-effort + deleted after being read. + * Room admins may wish to specify a retention policy for all messages in a + room. + * A room admin may wish to enforce a lower or upper bound on message + retention on behalf of its users, overriding their preferences. + * A bridged room should be able to enforce the data retention policies of the + remote rooms. + * Server admins may wish to specify a retention policy for their copy of given + rooms, in order to manage disk space. + +Additionally, we would like to provide this behaviour whilst also ensuring that +users generally see a consistent view of message history, without lots of gaps +and one-sided conversations where messages have been automatically removed. + +At the least, it should be possible for people participating in a conversation +to know the expected lifetime of the other messages in the conversation **at the +time they are sent** in order to know how best to interact with them (i.e. +whether they are knowingly participating in a future one-sided conversation or +not). + +We would also like to discourage users from setting low message retention as a +matter of course, as it can result in very antisocial conversation patterns to +the detriment of Matrix as a useful communication mechanism. + +This proposal does not try to solve the problems of: + * GDPR erasure (as this involves retrospectively changing the lifetime of + messages) + * Bulk redaction (e.g. to remove all messages from an abusive user in a room, + as again this is retrospectively changing message lifetime) + * Limiting the number (rather than age) of messages stored per room (as this is + more a question of quotaing rather than empowering privacy) + * Ephemeral messaging? + +## Proposal + +### User-specified per-message retention + +Users can specify per-message retention by adding the following fields to the +event alongside its content: + +`max_lifetime`: + the maximum duration in seconds for which a well-behaved server should store + this event. If absent, or null, it should be interpreted as 'forever'. + + +`min_lifetime`: + the minimum duration for which a well-behaved server should store this event. + If absent, or null, should be interpreted as 'forever' + +`self_destruct`: + a boolean for whether wellbehaved servers should remove this event after + seeing an explicit read receipt delivered for it. + +`expire_on_clients`: + a boolean for whether well-behaved clients should expire messages clientside + to match the min/max lifetime and/or self_destruct semantics fields. + +For instance: + +```json +{ + "type": "m.room.message", + "max_lifetime": 86400, + "content": ... +} +``` + +The above example means that servers receiving this message should store the +event for a only 86400 seconds (1 day), as measured from that event's +origin_server_ts, after which they MUST prune the event from their +DBs. We consciously do not redact the event, as we are trying to eliminate +metadata here at the cost of deliberately fracturing the DAG (which will +fragment into disparate chunks). + +```json +{ + "type": "m.room.message", + "min_lifetime": 2419200, + "content": ... +} +``` + +The above example means that servers receiving this message SHOULD store the +event forever, but MAY choose to prune their copy after 28 days (or longer) in +order to reclaim diskspace. + +```json +{ + "type": "m.room.message", + "self_destruct": true, + "expire_on_clients": true, + "content": ... +} +``` + +The above example describes 'self-destructing message' semantics where both server +and clients MUST prune/delete the event and associated data as soon as a read +receipt is received from the recipient. + +### User-advertised per-message retention + +If we had extensible profiles, users could advertise their intended per-message +retention in their profile (in global profile or per-room profile) as a useful +social cue. However, this would be purely informational. + +### Room Admin-specified per-room retention + +We introduce a `m.room.retention` state event, which room admins can set to +override the retention behaviour for a given room. This takes the same fields +described above. + +If set, these fields directly override any per-message retention behaviour +specified by the user - even if it means forcing laxer privacy requirements on +that user. This is a conscious privacy tradeoff to allow admins to specify +explicit privacy requirements for a room. For instance, a room may explicitly +disable self-destructing messages by setting `self_destruct: false`, or may +require all messages in the room be stored forever with `min_lifetime: null`. + +In the instance of `min_lifetime` or `max_lifetime` being overridden, the +invariant that `max_lifetime > min_lifetime` must be maintained by clamping +max_lifetime to be greater than `min_lifetime`. + +If the user's retention settings conflicts with those in the room, then the user's +clients should warn the user. + +### Server Admin-specified per-room retention + +Server admins have two ways of influencing message retention on their server: + +1) Specifying a default `m.room.retention` for rooms created on the server, as +defined as a per-server implementation configuration option which inserts the +state events after creating the room (effectively augmenting the presets used +when creating a room). If a server admin is trying to conserve diskspace, they +may do so by specifying and enforcing a relatively low min_lifetime (e.g. 1 +month), but not specify a max_lifetime, in the hope that other servers will +retain the data for longer. + +XXX: is this the correct approach to take? It's how we force E2E encryption on, +but it feels very fragmentory and magical presets to do different things depending +on which server you're on. + +2) By adjusting how aggressively their server enforces the the `min_lifetime` +value for message retention. For instance, a server admin could configure their +server to attempt to automatically remote purge messages in public rooms which +are older than three months (unless min_lifetime for those messages was set +higher). + +The expected configuration here could be something like: + * target_lifetime_public_remote_events: 3 months + * target_lifetime_public_local_events: null # forever + * target_lifetime_private_remote_events: null # forever + * target_lifetime_private_local_events: null # forever + +...which would try to automatically purge remote events from public rooms after +3 months (assuming their individual min_lifetime is not higher), but leave +others alone. + +XXX: should this configuration be specced or left as an implementation-specific +config option? + +Server admins could also override the requested retention limits (e.g. if resource +constrained), but this isn't recommended given it may result in history being +irrevocably lost against the senders' wishes. + +## Client-side behaviour + +Clients should independently calculate the retention of a message based on the +event fields and the room state, and show the message lifespan in the UI. If a +message has a finite lifespan that fact MUST be indicated clearly in the timeline +to allow users to correctly interact with the message. (The details of the +lifespan can be shown on demand, however). + +If `expire_on_clients` is true, then clients should also calculate expiration for +said events and delete them from their local stores as required. + +## Tradeoffs + +This proposal deliberately doesn't address GDPR erasure or mega-redaction scenarios, +as it attempts to build a coherent UX around the use case of users knowing their +privacy requirements *at the point they send messages*. Meanwhile GDPR erasure is +handled elsewhere (and involves hiding rather than purging messages, in order to +avoid annhilating conversation history), and mega-redaction is yet to be defined. + +## Potential issues + +How do we handle scenarios where users try to re-backfill in history which has +already been purged? This should presumably be a server admin option on whether +to allow it or not, and if allowed, configure how long the backfill should persist +for before being purged again? + +## Security considerations + +There's scope for abuse where users can send abusive messages into a room with a +short max_lifetime and/or self_destruct set true which promptly self-destruct. + +One solution for this could be for server implementations to implement a quarantine +mode which initially marks purged events as quarantined for N days before deleting +them entirely, allowing server admins to address abuse concerns. + +## Conclusion + +Previous attempts to solve this have got stuck by trying to combine together too many +disparate problems (e.g. reclaiming diskspace; aiding user data privacy; self-destructing +messages; mega-redaction; clearing history on specific devices; etc) - see +https://github.com/matrix-org/matrix-doc/issues/440 and https://github.com/matrix-org/matrix-doc/issues/447 +for the history. + +This proposal attempts to simplify things to strictly considering the question of +how long servers should persist events for (with the extension of self-destructing +messages added more to validate that the design is able to support such a feature). \ No newline at end of file From f770440475d653b3648b7754403a61de3aa98d62 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 30 Dec 2018 00:36:36 +0000 Subject: [PATCH 02/36] ephemeral msging ended up in scope --- proposals/1763-configurable-retention-periods.md | 1 - 1 file changed, 1 deletion(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 89cadfa41ae..99026b7db33 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -62,7 +62,6 @@ This proposal does not try to solve the problems of: as again this is retrospectively changing message lifetime) * Limiting the number (rather than age) of messages stored per room (as this is more a question of quotaing rather than empowering privacy) - * Ephemeral messaging? ## Proposal From b25367e3c99a1fa3352ecede04b6dd763423e720 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 30 Dec 2018 01:10:10 +0000 Subject: [PATCH 03/36] fix english --- proposals/1763-configurable-retention-periods.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 99026b7db33..cf7ea15d9cd 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -168,8 +168,8 @@ month), but not specify a max_lifetime, in the hope that other servers will retain the data for longer. XXX: is this the correct approach to take? It's how we force E2E encryption on, -but it feels very fragmentory and magical presets to do different things depending -on which server you're on. +but it feels very fragmentory to have magical presets which do different things +depending on which server you're on. 2) By adjusting how aggressively their server enforces the the `min_lifetime` value for message retention. For instance, a server admin could configure their @@ -177,7 +177,7 @@ server to attempt to automatically remote purge messages in public rooms which are older than three months (unless min_lifetime for those messages was set higher). -The expected configuration here could be something like: +A possible configuration here could be something like: * target_lifetime_public_remote_events: 3 months * target_lifetime_public_local_events: null # forever * target_lifetime_private_remote_events: null # forever From 2aafa02f0885b08a9382a288687622640ab80ca6 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 30 Dec 2018 01:17:22 +0000 Subject: [PATCH 04/36] clarify this only applies to non-state events; fix retention JSON structure --- .../1763-configurable-retention-periods.md | 21 +++++++++++-------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index cf7ea15d9cd..2f1aa2fa723 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -68,7 +68,7 @@ This proposal does not try to solve the problems of: ### User-specified per-message retention Users can specify per-message retention by adding the following fields to the -event alongside its content: +event within its content. Retention is only considered for non-state events. `max_lifetime`: the maximum duration in seconds for which a well-behaved server should store @@ -91,24 +91,22 @@ For instance: ```json { - "type": "m.room.message", "max_lifetime": 86400, - "content": ... } ``` The above example means that servers receiving this message should store the event for a only 86400 seconds (1 day), as measured from that event's -origin_server_ts, after which they MUST prune the event from their -DBs. We consciously do not redact the event, as we are trying to eliminate +origin_server_ts, after which they MUST prune all references to that event ID +from their database. + +We consciously do not redact the event, as we are trying to eliminate metadata here at the cost of deliberately fracturing the DAG (which will fragment into disparate chunks). ```json { - "type": "m.room.message", "min_lifetime": 2419200, - "content": ... } ``` @@ -118,10 +116,8 @@ order to reclaim diskspace. ```json { - "type": "m.room.message", "self_destruct": true, "expire_on_clients": true, - "content": ... } ``` @@ -129,6 +125,9 @@ The above example describes 'self-destructing message' semantics where both serv and clients MUST prune/delete the event and associated data as soon as a read receipt is received from the recipient. +TODO: do we want to pass these in as querystring params when sending, instead of +putting them inside event.content? + ### User-advertised per-message retention If we had extensible profiles, users could advertise their intended per-message @@ -220,6 +219,10 @@ already been purged? This should presumably be a server admin option on whether to allow it or not, and if allowed, configure how long the backfill should persist for before being purged again? +How do we handle retention of media uploads (especially for E2E rooms)? It feels +the upload itself might warrant retention values applied to it. + + ## Security considerations There's scope for abuse where users can send abusive messages into a room with a From 64695ed8d39ffcb4109fd9751ae0e9e40c8f50f0 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 30 Dec 2018 01:19:56 +0000 Subject: [PATCH 05/36] make conflict alg explicit for user retention settings --- proposals/1763-configurable-retention-periods.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 2f1aa2fa723..5d092ea042d 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -152,7 +152,8 @@ invariant that `max_lifetime > min_lifetime` must be maintained by clamping max_lifetime to be greater than `min_lifetime`. If the user's retention settings conflicts with those in the room, then the user's -clients should warn the user. +clients should warn the user. A conflict exists if any field in `m.room.retention` +is present which the user would otherwise be setting on their messages. ### Server Admin-specified per-room retention From c493dbda6d8b7fd6603f2a88cd4dfede57d7b90e Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 30 Dec 2018 01:20:56 +0000 Subject: [PATCH 06/36] change max >= min invariant --- proposals/1763-configurable-retention-periods.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 5d092ea042d..dc79a6fd635 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -148,8 +148,8 @@ disable self-destructing messages by setting `self_destruct: false`, or may require all messages in the room be stored forever with `min_lifetime: null`. In the instance of `min_lifetime` or `max_lifetime` being overridden, the -invariant that `max_lifetime > min_lifetime` must be maintained by clamping -max_lifetime to be greater than `min_lifetime`. +invariant that `max_lifetime >= min_lifetime` must be maintained by clamping +max_lifetime to be equal to `min_lifetime`. If the user's retention settings conflicts with those in the room, then the user's clients should warn the user. A conflict exists if any field in `m.room.retention` From 0afc3afc1f26fd905b3d8810a92b1f5d9f75a950 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 30 Dec 2018 01:25:04 +0000 Subject: [PATCH 07/36] spell out that self-destructing msgs need explicit RRs --- proposals/1763-configurable-retention-periods.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index dc79a6fd635..705dd2b82ea 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -123,7 +123,11 @@ order to reclaim diskspace. The above example describes 'self-destructing message' semantics where both server and clients MUST prune/delete the event and associated data as soon as a read -receipt is received from the recipient. +receipt for that message is received from the recipient. + +Clients and servers MUST send explicit read receipts per-message for +self-destructing messages (rather than for the most recently read message, +as is the normal operation), so that messages can be destructed as requested. TODO: do we want to pass these in as querystring params when sending, instead of putting them inside event.content? From 7597e03b58c8a745817a4a551204d8a6f7a1f74c Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 30 Dec 2018 01:30:00 +0000 Subject: [PATCH 08/36] more validation on fields --- proposals/1763-configurable-retention-periods.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 705dd2b82ea..1f231679553 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -72,20 +72,23 @@ event within its content. Retention is only considered for non-state events. `max_lifetime`: the maximum duration in seconds for which a well-behaved server should store - this event. If absent, or null, it should be interpreted as 'forever'. - + this event. Must be null or in range [0, 231-1]. If absent, or null, + should be interpreted as 'forever'. `min_lifetime`: the minimum duration for which a well-behaved server should store this event. - If absent, or null, should be interpreted as 'forever' + Must be null or in range [0, 231-1]. If absent, or null, should be + interpreted as 'forever'. `self_destruct`: a boolean for whether wellbehaved servers should remove this event after - seeing an explicit read receipt delivered for it. + seeing an explicit read receipt delivered for it. If absent, or null, should + be interpreted as false. `expire_on_clients`: a boolean for whether well-behaved clients should expire messages clientside - to match the min/max lifetime and/or self_destruct semantics fields. + to match the min/max lifetime and/or self_destruct semantics fields. If absent, + or null, should be interpreted as false. For instance: From 7a8d204ca1ebfa131ec03cc96d0810bc338ce809 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 30 Dec 2018 02:19:33 +0000 Subject: [PATCH 09/36] spell out how the example server admin overrides would work --- proposals/1763-configurable-retention-periods.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 1f231679553..daa2c1bb77d 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -194,6 +194,12 @@ A possible configuration here could be something like: 3 months (assuming their individual min_lifetime is not higher), but leave others alone. +These config values would interact with the min_lifetime and max_lifetime values +of a message (either per-message or per-room) in the different classes of room +by decreasing the effective max_lifetime to the proposed value (whilst +preserving the `max_lifetime >= min_lifetime` invariant). However, the precise +behaviour would be up to the server implementer. + XXX: should this configuration be specced or left as an implementation-specific config option? From 4646fcd613e8ff7347604d0357a539ae3cc5b71c Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 30 Dec 2018 14:51:41 +0000 Subject: [PATCH 10/36] improve wording; spell out purge/redact dichotomy; add explicit alg --- .../1763-configurable-retention-periods.md | 75 ++++++++++++++++--- 1 file changed, 64 insertions(+), 11 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index daa2c1bb77d..d2c232500e5 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -100,12 +100,13 @@ For instance: The above example means that servers receiving this message should store the event for a only 86400 seconds (1 day), as measured from that event's -origin_server_ts, after which they MUST prune all references to that event ID -from their database. +origin_server_ts, after which they MUST purge all references to that event ID +(e.g. from their db and any in-memory queues). We consciously do not redact the event, as we are trying to eliminate -metadata here at the cost of deliberately fracturing the DAG (which will -fragment into disparate chunks). +metadata here at the cost of deliberately fracturing the DAG, which will +fragment into disparate chunks. (See "Issues" below in terms of whether this +is actually valid) ```json { @@ -114,7 +115,7 @@ fragment into disparate chunks). ``` The above example means that servers receiving this message SHOULD store the -event forever, but MAY choose to prune their copy after 28 days (or longer) in +event forever, but MAY choose to purge their copy after 28 days (or longer) in order to reclaim diskspace. ```json @@ -125,7 +126,7 @@ order to reclaim diskspace. ``` The above example describes 'self-destructing message' semantics where both server -and clients MUST prune/delete the event and associated data as soon as a read +and clients MUST purge/delete the event and associated data as soon as a read receipt for that message is received from the recipient. Clients and servers MUST send explicit read receipts per-message for @@ -147,7 +148,7 @@ We introduce a `m.room.retention` state event, which room admins can set to override the retention behaviour for a given room. This takes the same fields described above. -If set, these fields directly override any per-message retention behaviour +If set, these fields replace any per-message retention behaviour specified by the user - even if it means forcing laxer privacy requirements on that user. This is a conscious privacy tradeoff to allow admins to specify explicit privacy requirements for a room. For instance, a room may explicitly @@ -158,9 +159,10 @@ In the instance of `min_lifetime` or `max_lifetime` being overridden, the invariant that `max_lifetime >= min_lifetime` must be maintained by clamping max_lifetime to be equal to `min_lifetime`. -If the user's retention settings conflicts with those in the room, then the user's -clients should warn the user. A conflict exists if any field in `m.room.retention` -is present which the user would otherwise be setting on their messages. +If the user's retention settings conflicts with those in the room, then the +user's clients should warn the user when participating in the room. A conflict +exists if the user sets retention fields on their messages which are specified +with differing values on the `m.room.retention` state event. ### Server Admin-specified per-room retention @@ -218,6 +220,51 @@ lifespan can be shown on demand, however). If `expire_on_clients` is true, then clients should also calculate expiration for said events and delete them from their local stores as required. +## Pruning algorithm + +To summarise, servers and clients must implement the pruning algorithm as +follows: + +If we're a client, apply the algorithm if: + * if specified, the `expire_on_clients` field in the `m.room.retention` event for the room is true. + * otherwise, if specified, the message's `expire_on_clients` field is true. + * otherwise, don't apply the algorithm. + +The maximum lifetime of an event is calculated as: + * if specified, the `max_lifetime` field in the `m.room.retention` event for the room. + * otherwise, if specified, the message's `max_lifetime` field. + * otherwise, the message's maximum lifetime is considered 'forever'. + +The minimum lifetime of an event is calculated as: + * if specified, the `min_lifetime` field in the `m.room.retention` event for the room. + * otherwise, if specified, the message's `min_lifetime` field. + * otherwise, the message's minimum lifetime is considered 'forever'. + * for clients, `min_lifetime` should be considered to be 0 (as there is no + requirement for clients to persist events). + +If the calculated max_lifetime is less than the min_lifetime then the max_lifetime +is set to be equal to the min_lifetime. + +The server/client then selects a lifetime of the event to lie between the +calculated values of minimum and maximum lifetime, based on their implementation +and configuration requirements. The selected lifetime MUST not exceed the +calculated maximum lifetime. The selected lifetime SHOULD not be less than the +calculated minimum lifetime, but may be less in case of constrained resources, +in which case the server should prioritise retaining locally generated events +over remote generated events. + +Server/clients then set a maintenance task to remove ("purge") the event and +references to its event ID from their DB and in-memory queues after the lifetime +has expired (starting timing from the absolute origin_server_ts on the event). + +As a special case, servers and clients should immediately purge the event, on observing +a read receipt for that specific event ID, if: + * if specified, the `self_destruct` field in the `m.room.retention` event for the room is true. + * otherwise, if specified, the message's `self_destruct` field is true. + +If possible, servers/clients should remove downstream notifications of a message +once it has expired (e.g. by cancelling push notifications). + ## Tradeoffs This proposal deliberately doesn't address GDPR erasure or mega-redaction scenarios, @@ -226,7 +273,13 @@ privacy requirements *at the point they send messages*. Meanwhile GDPR erasure handled elsewhere (and involves hiding rather than purging messages, in order to avoid annhilating conversation history), and mega-redaction is yet to be defined. -## Potential issues +## Issues + +It's debatable as to whether we're better off applying the redaction algorithm +to expired events (and thus keep the integrity of the DAG intact, at the expense +of leaking metadata), or whether to purge instead (as per the current proposal), +which will punch holes in the DAG and potentially break the ability to backpaginate +the room. How do we handle scenarios where users try to re-backfill in history which has already been purged? This should presumably be a server admin option on whether From c55158dddf368c3f125c80cdee7fd2185a6d1b68 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 30 Dec 2018 15:05:13 +0000 Subject: [PATCH 11/36] clarify redaction semantic and default PL --- proposals/1763-configurable-retention-periods.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index d2c232500e5..a8c3467e879 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -133,6 +133,13 @@ Clients and servers MUST send explicit read receipts per-message for self-destructing messages (rather than for the most recently read message, as is the normal operation), so that messages can be destructed as requested. +These retention fields are preserved during redaction, so that even if the event +is redacted, the original copy can be subsequently purged appropriately from the +DB. + +XXX: This may change if we end up redacting rather than purging events (see +Issues below) + TODO: do we want to pass these in as querystring params when sending, instead of putting them inside event.content? @@ -146,7 +153,8 @@ social cue. However, this would be purely informational. We introduce a `m.room.retention` state event, which room admins can set to override the retention behaviour for a given room. This takes the same fields -described above. +described above. It follows the default PL semantics for a state event (requiring +PL of 50 by default to be set) If set, these fields replace any per-message retention behaviour specified by the user - even if it means forcing laxer privacy requirements on From 6e33c2fa5f90330d361d342702a04fe69840a51a Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 30 Dec 2018 15:12:23 +0000 Subject: [PATCH 12/36] track max's idea of advertising retention per-server --- proposals/1763-configurable-retention-periods.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index a8c3467e879..44e86ee2484 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -297,6 +297,12 @@ for before being purged again? How do we handle retention of media uploads (especially for E2E rooms)? It feels the upload itself might warrant retention values applied to it. +Should room retention be announced in a room per-server? The advantage is full +flexibility in terms of servers announcing their different policies for a room +(and possibly letting users know how likely history is to be retained, or conversely +letting servers know if they need to step up to retain history). The disadvantage +is that it could make for very complex UX for end-users: "Warning, some servers in +this room have overridden history retention to conflict with your preferences" etc. ## Security considerations From 28ea4e10656c18afff87472c59990f44dcfecdc8 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 30 Dec 2018 15:14:57 +0000 Subject: [PATCH 13/36] fix normatives --- proposals/1763-configurable-retention-periods.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 44e86ee2484..7614b8d5634 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -71,22 +71,22 @@ Users can specify per-message retention by adding the following fields to the event within its content. Retention is only considered for non-state events. `max_lifetime`: - the maximum duration in seconds for which a well-behaved server should store + the maximum duration in seconds for which a server must store this event. Must be null or in range [0, 231-1]. If absent, or null, should be interpreted as 'forever'. `min_lifetime`: - the minimum duration for which a well-behaved server should store this event. + the minimum duration for which a server should store this event. Must be null or in range [0, 231-1]. If absent, or null, should be interpreted as 'forever'. `self_destruct`: - a boolean for whether wellbehaved servers should remove this event after + a boolean for whether servers must remove this event after seeing an explicit read receipt delivered for it. If absent, or null, should be interpreted as false. `expire_on_clients`: - a boolean for whether well-behaved clients should expire messages clientside + a boolean for whether clients must expire messages clientside to match the min/max lifetime and/or self_destruct semantics fields. If absent, or null, should be interpreted as false. From cca99dd2e60308f9908c7cfd543ba89c203e912b Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Fri, 4 Jan 2019 23:39:47 +0000 Subject: [PATCH 14/36] clarify client behaviour --- proposals/1763-configurable-retention-periods.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 7614b8d5634..ad27a04d9da 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -219,10 +219,10 @@ irrevocably lost against the senders' wishes. ## Client-side behaviour -Clients should independently calculate the retention of a message based on the -event fields and the room state, and show the message lifespan in the UI. If a -message has a finite lifespan that fact MUST be indicated clearly in the timeline -to allow users to correctly interact with the message. (The details of the +Clients which persist conversation history must calculate the retention of a message +based on the event fields and the room state. If a message has a finite lifespan +that fact MUST be indicated clearly in the timeline +to allow users to interact with the message in an informed manner. (The details of the lifespan can be shown on demand, however). If `expire_on_clients` is true, then clients should also calculate expiration for @@ -323,4 +323,4 @@ for the history. This proposal attempts to simplify things to strictly considering the question of how long servers should persist events for (with the extension of self-destructing -messages added more to validate that the design is able to support such a feature). \ No newline at end of file +messages added more to validate that the design is able to support such a feature). From a4974b69f2b2db12e6b6ea936e68ca1a153e4ad9 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Fri, 4 Jan 2019 23:50:15 +0000 Subject: [PATCH 15/36] make self_destruct set a timer in seconds rather than be binary. --- .../1763-configurable-retention-periods.md | 33 ++++++++++++------- 1 file changed, 22 insertions(+), 11 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index ad27a04d9da..cd836199836 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -73,17 +73,19 @@ event within its content. Retention is only considered for non-state events. `max_lifetime`: the maximum duration in seconds for which a server must store this event. Must be null or in range [0, 231-1]. If absent, or null, - should be interpreted as 'forever'. + should be interpreted as 'forever'. `min_lifetime`: the minimum duration for which a server should store this event. Must be null or in range [0, 231-1]. If absent, or null, should be - interpreted as 'forever'. + interpreted as 'forever'. `self_destruct`: - a boolean for whether servers must remove this event after - seeing an explicit read receipt delivered for it. If absent, or null, should - be interpreted as false. + the duration in seconds after which servers (and optionally clients) must + remove this event after seeing an explicit read receipt delivered for it. + Must be null or in range [0, 231-1]. If absent, or null, this + behaviour does not take effect. + `expire_on_clients`: a boolean for whether clients must expire messages clientside @@ -120,19 +122,24 @@ order to reclaim diskspace. ```json { - "self_destruct": true, + "self_destruct": 5, "expire_on_clients": true, } ``` The above example describes 'self-destructing message' semantics where both server -and clients MUST purge/delete the event and associated data as soon as a read -receipt for that message is received from the recipient. +and clients MUST purge/delete the event and associated data, 5 seconds after a read +receipt for that message is received from the recipient. In other words, the +recipient(s) have 5 seconds to view the message after receiving it. Clients and servers MUST send explicit read receipts per-message for self-destructing messages (rather than for the most recently read message, as is the normal operation), so that messages can be destructed as requested. +XXX: this means that self-destruct only really makes sense for 1:1 rooms. is this +adequate? should self-destruct messages be removed from this MSC entirety to +simplify landing it? + These retention fields are preserved during redaction, so that even if the event is redacted, the original copy can be subsequently purged appropriately from the DB. @@ -160,7 +167,7 @@ If set, these fields replace any per-message retention behaviour specified by the user - even if it means forcing laxer privacy requirements on that user. This is a conscious privacy tradeoff to allow admins to specify explicit privacy requirements for a room. For instance, a room may explicitly -disable self-destructing messages by setting `self_destruct: false`, or may +disable self-destructing messages by setting `self_destruct: null`, or may require all messages in the room be stored forever with `min_lifetime: null`. In the instance of `min_lifetime` or `max_lifetime` being overridden, the @@ -265,10 +272,14 @@ Server/clients then set a maintenance task to remove ("purge") the event and references to its event ID from their DB and in-memory queues after the lifetime has expired (starting timing from the absolute origin_server_ts on the event). -As a special case, servers and clients should immediately purge the event, on observing +As a special case, servers and clients should purge the event N seconds after observing a read receipt for that specific event ID, if: - * if specified, the `self_destruct` field in the `m.room.retention` event for the room is true. + * if specified, the `self_destruct` field in the `m.room.retention` event for + the room is set to N where N is not null. * otherwise, if specified, the message's `self_destruct` field is true. + +The device emitting the read receipt for a self-destructing message must give the +user sufficient time to view the message after op If possible, servers/clients should remove downstream notifications of a message once it has expired (e.g. by cancelling push notifications). From c27394ca01ad9b89724c156c7bfc1f5cab82d82e Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sat, 5 Jan 2019 00:28:25 +0000 Subject: [PATCH 16/36] clarify warning about conflicts --- proposals/1763-configurable-retention-periods.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index cd836199836..4abab371986 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -175,9 +175,11 @@ invariant that `max_lifetime >= min_lifetime` must be maintained by clamping max_lifetime to be equal to `min_lifetime`. If the user's retention settings conflicts with those in the room, then the -user's clients should warn the user when participating in the room. A conflict +user's clients MUST warn the user when participating in the room. A conflict exists if the user sets retention fields on their messages which are specified -with differing values on the `m.room.retention` state event. +with differing values on the `m.room.retention` state event. This is particularly +important to warn the user if the room's retention is longer than their requested +retention period. ### Server Admin-specified per-room retention From bdce6f1101214a364f2557b112e77e1abbdf78ed Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 11 Aug 2019 01:03:26 +0200 Subject: [PATCH 17/36] remove per-message retention and self-destruct messages entirely to try to land this --- .../1763-configurable-retention-periods.md | 274 +++++++----------- 1 file changed, 112 insertions(+), 162 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 4abab371986..be7093e829c 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -1,4 +1,4 @@ -# Proposal for specifying configurable retention periods for messages. +# Proposal for specifying configurable per-room message retention periods. A major shortcoming of Matrix has been the inability to specify how long events should stored by the servers and clients which participate in a given @@ -7,13 +7,17 @@ room. This proposal aims to specify a simple yet flexible set of rules which allow users, room admins and server admins to determine how long data should be stored for a room, from the perspective of respecting the privacy requirements -of that room (which may range from "burn after reading" ephemeral messages, +of that room (which may range from a "burn after reading" ephemeral conversation, through to FOIA-style public record keeping requirements). As well as enforcing privacy requirements, these rules provide a way for server administrators to better manage disk space (e.g. to enforce rules such as "don't store remote events for public rooms for more than a month"). +This proposal originally tried to also define semantics for per-message retention +as well as per-room; this has been split out into MSCxxxx in order to get the +easier per-room semantics landed. + ## Problem: Matrix is inherently a protocol for storing and synchronising conversation @@ -46,52 +50,77 @@ users generally see a consistent view of message history, without lots of gaps and one-sided conversations where messages have been automatically removed. At the least, it should be possible for people participating in a conversation -to know the expected lifetime of the other messages in the conversation **at the -time they are sent** in order to know how best to interact with them (i.e. -whether they are knowingly participating in a future one-sided conversation or -not). +to know the expected lifetime of the other messages in the conversation **at +the time they are sent** in order to know how best to interact with them (i.e. +whether they are knowingly participating in a ephemeral conversation or not). -We would also like to discourage users from setting low message retention as a -matter of course, as it can result in very antisocial conversation patterns to -the detriment of Matrix as a useful communication mechanism. +We would also like to set the expectation that rooms typically have a long +message retention - allowing those who wish to use Matrix to archive their +conversations to do so, and to allow Matrix to evolve as repository of +knowledge... unless participants explicitly request for a conversation history +to have limited lifetime. This proposal does not try to solve the problems of: * GDPR erasure (as this involves retrospectively changing the lifetime of messages) * Bulk redaction (e.g. to remove all messages from an abusive user in a room, as again this is retrospectively changing message lifetime) - * Limiting the number (rather than age) of messages stored per room (as this is - more a question of quotaing rather than empowering privacy) + * Specifying history retention based on the number of messages (as opposed to + their age) in a room. This is descoped because it is effectively a disk space + management problem for a given server or client, rather than a policy + problem of the room. It can be solved as an implementation specific manner, or + a new MSC can be proposed to standardise letting clients specify disk quotas + per room. + * Per-message retention (as having a mix of message lifetime within a room + complicates implementation considerably - for instance, you cannot just + purge arbitrary events from the DB without fracturing the DAG of the room, + and so a different approach is required) ## Proposal -### User-specified per-message retention +### Room Admin-specified per-room retention + +We introduce a `m.room.retention` state event, which room admins can set to +mandate the history retention behaviour for a given room. It follows the +default PL semantics for a state event (requiring PL of 50 by default to be +set). -Users can specify per-message retention by adding the following fields to the -event within its content. Retention is only considered for non-state events. +The following fields are defined in the `m.room.retention` contents: `max_lifetime`: - the maximum duration in seconds for which a server must store - this event. Must be null or in range [0, 231-1]. If absent, or null, + the maximum duration in seconds for which a server must store + this event. Must be null or in range [0, 231-1]. If absent, or null, should be interpreted as 'forever'. `min_lifetime`: - the minimum duration for which a server should store this event. - Must be null or in range [0, 231-1]. If absent, or null, should be + the minimum duration for which a server should store this event. + Must be null or in range [0, 231-1]. If absent, or null, should be interpreted as 'forever'. - -`self_destruct`: - the duration in seconds after which servers (and optionally clients) must - remove this event after seeing an explicit read receipt delivered for it. - Must be null or in range [0, 231-1]. If absent, or null, this - behaviour does not take effect. - - + `expire_on_clients`: - a boolean for whether clients must expire messages clientside - to match the min/max lifetime and/or self_destruct semantics fields. If absent, + a boolean for whether clients must expire messages clientside + to match the min/max lifetime and/or self_destruct semantics fields. If absent, or null, should be interpreted as false. +Retention is only considered for non-state events. + +If set, these fields SHOULD replace other retention behaviour configured by +the user or server admin - even if it means forcing laxer privacy requirements +on that user. This is a conscious privacy tradeoff to allow admins to specify +explicit privacy requirements for a room. For instance, a room may explicitly +require all messages in the room be stored forever with `min_lifetime: null`. + +In the instance of `min_lifetime` or `max_lifetime` being overridden, the +invariant that `max_lifetime >= min_lifetime` must be maintained by clamping +max_lifetime to be equal to `min_lifetime`. + +If the user's retention settings conflicts with those in the room, then the +user's clients MUST warn the user when participating in the room. A conflict +exists if the user has configured their client to create rooms with retention +settings which differing from the values on the `m.room.retention` state +event. This is particularly important to warn the user if the room's +retention is longer than their default requested retention period. + For instance: ```json @@ -100,15 +129,14 @@ For instance: } ``` -The above example means that servers receiving this message should store the -event for a only 86400 seconds (1 day), as measured from that event's -origin_server_ts, after which they MUST purge all references to that event ID -(e.g. from their db and any in-memory queues). +The above example means that servers receiving messages in this room should +store the event for only 86400 seconds (1 day), as measured from that +event's origin_server_ts, after which they MUST purge all references to that +event ID (e.g. from their db and any in-memory queues). -We consciously do not redact the event, as we are trying to eliminate -metadata here at the cost of deliberately fracturing the DAG, which will -fragment into disparate chunks. (See "Issues" below in terms of whether this -is actually valid) +We consciously do not redact the event, as we are trying to eliminate metadata +and save disk space at the cost of deliberately discarding older messages from +the DAG. ```json { @@ -120,67 +148,6 @@ The above example means that servers receiving this message SHOULD store the event forever, but MAY choose to purge their copy after 28 days (or longer) in order to reclaim diskspace. -```json -{ - "self_destruct": 5, - "expire_on_clients": true, -} -``` - -The above example describes 'self-destructing message' semantics where both server -and clients MUST purge/delete the event and associated data, 5 seconds after a read -receipt for that message is received from the recipient. In other words, the -recipient(s) have 5 seconds to view the message after receiving it. - -Clients and servers MUST send explicit read receipts per-message for -self-destructing messages (rather than for the most recently read message, -as is the normal operation), so that messages can be destructed as requested. - -XXX: this means that self-destruct only really makes sense for 1:1 rooms. is this -adequate? should self-destruct messages be removed from this MSC entirety to -simplify landing it? - -These retention fields are preserved during redaction, so that even if the event -is redacted, the original copy can be subsequently purged appropriately from the -DB. - -XXX: This may change if we end up redacting rather than purging events (see -Issues below) - -TODO: do we want to pass these in as querystring params when sending, instead of -putting them inside event.content? - -### User-advertised per-message retention - -If we had extensible profiles, users could advertise their intended per-message -retention in their profile (in global profile or per-room profile) as a useful -social cue. However, this would be purely informational. - -### Room Admin-specified per-room retention - -We introduce a `m.room.retention` state event, which room admins can set to -override the retention behaviour for a given room. This takes the same fields -described above. It follows the default PL semantics for a state event (requiring -PL of 50 by default to be set) - -If set, these fields replace any per-message retention behaviour -specified by the user - even if it means forcing laxer privacy requirements on -that user. This is a conscious privacy tradeoff to allow admins to specify -explicit privacy requirements for a room. For instance, a room may explicitly -disable self-destructing messages by setting `self_destruct: null`, or may -require all messages in the room be stored forever with `min_lifetime: null`. - -In the instance of `min_lifetime` or `max_lifetime` being overridden, the -invariant that `max_lifetime >= min_lifetime` must be maintained by clamping -max_lifetime to be equal to `min_lifetime`. - -If the user's retention settings conflicts with those in the room, then the -user's clients MUST warn the user when participating in the room. A conflict -exists if the user sets retention fields on their messages which are specified -with differing values on the `m.room.retention` state event. This is particularly -important to warn the user if the room's retention is longer than their requested -retention period. - ### Server Admin-specified per-room retention Server admins have two ways of influencing message retention on their server: @@ -193,17 +160,21 @@ may do so by specifying and enforcing a relatively low min_lifetime (e.g. 1 month), but not specify a max_lifetime, in the hope that other servers will retain the data for longer. -XXX: is this the correct approach to take? It's how we force E2E encryption on, -but it feels very fragmentory to have magical presets which do different things -depending on which server you're on. + XXX: is this the correct approach to take? It's how we force E2E encryption + on, but it feels very fragmentory to have magical presets which do different + things depending on which server you're on. The alternative would be some + kind of federation-aware negotiation where a server refuses to participate in + a room unless it gets its way on retention settings, however this feels + unnecessarily draconian. 2) By adjusting how aggressively their server enforces the the `min_lifetime` -value for message retention. For instance, a server admin could configure their -server to attempt to automatically remote purge messages in public rooms which -are older than three months (unless min_lifetime for those messages was set -higher). +value for message retention within a room. For instance, a server admin could +configure their server to attempt to automatically purge remote messages in +public rooms which are older than three months (unless min_lifetime for those +messages was set higher). -A possible configuration here could be something like: +A possible implementation-specific server configuration here could be +something like: * target_lifetime_public_remote_events: 3 months * target_lifetime_public_local_events: null # forever * target_lifetime_private_remote_events: null # forever @@ -213,29 +184,15 @@ A possible configuration here could be something like: 3 months (assuming their individual min_lifetime is not higher), but leave others alone. -These config values would interact with the min_lifetime and max_lifetime values -of a message (either per-message or per-room) in the different classes of room -by decreasing the effective max_lifetime to the proposed value (whilst -preserving the `max_lifetime >= min_lifetime` invariant). However, the precise -behaviour would be up to the server implementer. - -XXX: should this configuration be specced or left as an implementation-specific -config option? - -Server admins could also override the requested retention limits (e.g. if resource -constrained), but this isn't recommended given it may result in history being -irrevocably lost against the senders' wishes. - -## Client-side behaviour +These config values would interact with the min_lifetime and max_lifetime +values in the different classes of room by decreasing the effective +max_lifetime to the proposed value (whilst preserving the `max_lifetime >= +min_lifetime` invariant). However, the precise behaviour would be up to the +server implementation. -Clients which persist conversation history must calculate the retention of a message -based on the event fields and the room state. If a message has a finite lifespan -that fact MUST be indicated clearly in the timeline -to allow users to interact with the message in an informed manner. (The details of the -lifespan can be shown on demand, however). - -If `expire_on_clients` is true, then clients should also calculate expiration for -said events and delete them from their local stores as required. +Server admins could also override the requested retention limits (e.g. if +resource constrained), but this isn't recommended given it may result in +history being irrevocably lost against the senders' wishes. ## Pruning algorithm @@ -244,17 +201,14 @@ follows: If we're a client, apply the algorithm if: * if specified, the `expire_on_clients` field in the `m.room.retention` event for the room is true. - * otherwise, if specified, the message's `expire_on_clients` field is true. * otherwise, don't apply the algorithm. The maximum lifetime of an event is calculated as: * if specified, the `max_lifetime` field in the `m.room.retention` event for the room. - * otherwise, if specified, the message's `max_lifetime` field. * otherwise, the message's maximum lifetime is considered 'forever'. The minimum lifetime of an event is calculated as: * if specified, the `min_lifetime` field in the `m.room.retention` event for the room. - * otherwise, if specified, the message's `min_lifetime` field. * otherwise, the message's minimum lifetime is considered 'forever'. * for clients, `min_lifetime` should be considered to be 0 (as there is no requirement for clients to persist events). @@ -270,22 +224,29 @@ calculated minimum lifetime, but may be less in case of constrained resources, in which case the server should prioritise retaining locally generated events over remote generated events. -Server/clients then set a maintenance task to remove ("purge") the event and -references to its event ID from their DB and in-memory queues after the lifetime +Server/clients then set a maintenance task to remove ("purge") old events and +references to their IDs from their DB and in-memory queues after the lifetime has expired (starting timing from the absolute origin_server_ts on the event). -As a special case, servers and clients should purge the event N seconds after observing -a read receipt for that specific event ID, if: - * if specified, the `self_destruct` field in the `m.room.retention` event for - the room is set to N where N is not null. - * otherwise, if specified, the message's `self_destruct` field is true. - -The device emitting the read receipt for a self-destructing message must give the -user sufficient time to view the message after op - If possible, servers/clients should remove downstream notifications of a message once it has expired (e.g. by cancelling push notifications). +If a user tries to re-backfill in history which has already been purged, it's +up to the server implementation's configuration on whether to allow it or not, +and if allowed, configure how long the backfill should persist before being +purged again. + +Media uploads must also be expired in line with the retention policy of the +room. For unencrypted rooms this is easy; when the event that references a +piece of content is expired, the content must be expired too - assuming the +content was first uploaded in that room. (This allows for content reuse in +retention-limited rooms for things like stickers). + +For encrypted rooms, there is (currently) no alternative than have the client +manually delete media content from the server as it expires its own local +copies of messages. (This requires us to have actually implemented a media +deletion API at last.) + ## Tradeoffs This proposal deliberately doesn't address GDPR erasure or mega-redaction scenarios, @@ -294,21 +255,14 @@ privacy requirements *at the point they send messages*. Meanwhile GDPR erasure handled elsewhere (and involves hiding rather than purging messages, in order to avoid annhilating conversation history), and mega-redaction is yet to be defined. -## Issues +It also doesn't solve specifying storage quotas per room (i.e. "store the last +500 messages in this room"), to avoid scope creep. This can be handled by an +MSC for configuring resource quotas per room (or per user) in general. -It's debatable as to whether we're better off applying the redaction algorithm -to expired events (and thus keep the integrity of the DAG intact, at the expense -of leaking metadata), or whether to purge instead (as per the current proposal), -which will punch holes in the DAG and potentially break the ability to backpaginate -the room. +It also doesn't solve per-message retention behaviour - this has been split out +into a seperate MSC. -How do we handle scenarios where users try to re-backfill in history which has -already been purged? This should presumably be a server admin option on whether -to allow it or not, and if allowed, configure how long the backfill should persist -for before being purged again? - -How do we handle retention of media uploads (especially for E2E rooms)? It feels -the upload itself might warrant retention values applied to it. +## Issues Should room retention be announced in a room per-server? The advantage is full flexibility in terms of servers announcing their different policies for a room @@ -319,12 +273,9 @@ this room have overridden history retention to conflict with your preferences" e ## Security considerations -There's scope for abuse where users can send abusive messages into a room with a -short max_lifetime and/or self_destruct set true which promptly self-destruct. - -One solution for this could be for server implementations to implement a quarantine -mode which initially marks purged events as quarantined for N days before deleting -them entirely, allowing server admins to address abuse concerns. +It's always a gentlemen's agreement for servers and clients alike to actually +uphold the requested retention behaviour; users should never rely on deletion +actually having happened. ## Conclusion @@ -335,5 +286,4 @@ https://github.com/matrix-org/matrix-doc/issues/440 and https://github.com/matri for the history. This proposal attempts to simplify things to strictly considering the question of -how long servers should persist events for (with the extension of self-destructing -messages added more to validate that the design is able to support such a feature). +how long servers (and clients) should persist events for. From a30a8536ca0ff56cd7101cb4e60a80cd714df87d Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 11 Aug 2019 01:09:10 +0200 Subject: [PATCH 18/36] spell out that events will disappear from event streams when purged --- proposals/1763-configurable-retention-periods.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index be7093e829c..7373c56f3b3 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -227,6 +227,9 @@ over remote generated events. Server/clients then set a maintenance task to remove ("purge") old events and references to their IDs from their DB and in-memory queues after the lifetime has expired (starting timing from the absolute origin_server_ts on the event). +It's worth noting that this means events may sometimes disappear from event +streams; calling the same `/sync` or `/messages` API twice may give different +results if some of the events have disappeared in the interim. If possible, servers/clients should remove downstream notifications of a message once it has expired (e.g. by cancelling push notifications). From c281420741c781da8b01df95bead06a09c79208d Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 11 Aug 2019 01:14:33 +0200 Subject: [PATCH 19/36] add the 'why not nego?' tradeoff --- proposals/1763-configurable-retention-periods.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 7373c56f3b3..a4263be7cf1 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -252,6 +252,12 @@ deletion API at last.) ## Tradeoffs +This proposal tries to keep it simple by letting the room admin mandate the +retention behaviour for a room. However, we could alternatively have a negotiation +between the client and its server to determine the viable retention for a room. +Or we could have the servers negotiate together to decide the retention for a room. +Both seem overengineered, however. + This proposal deliberately doesn't address GDPR erasure or mega-redaction scenarios, as it attempts to build a coherent UX around the use case of users knowing their privacy requirements *at the point they send messages*. Meanwhile GDPR erasure is From ef215dd5f3c7a74b313dc99b6465b6ff572cbfee Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 11 Aug 2019 01:17:38 +0200 Subject: [PATCH 20/36] clarify the intention to not default to finite message retention --- proposals/1763-configurable-retention-periods.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index a4263be7cf1..cc44ea67fd4 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -55,10 +55,10 @@ the time they are sent** in order to know how best to interact with them (i.e. whether they are knowingly participating in a ephemeral conversation or not). We would also like to set the expectation that rooms typically have a long -message retention - allowing those who wish to use Matrix to archive their -conversations to do so, and to allow Matrix to evolve as repository of -knowledge... unless participants explicitly request for a conversation history -to have limited lifetime. +message retention - allowing those who wish to use Matrix to act as an archive +of their conversations to do so. If everyone starts defaulting their rooms to +finite retention periods, then the value of Matrix as a knowledge repository is +broken. This proposal does not try to solve the problems of: * GDPR erasure (as this involves retrospectively changing the lifetime of From 0b6a2097244145108752e15066f0641fc2980054 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 11 Aug 2019 01:22:51 +0200 Subject: [PATCH 21/36] spell out not to default to a max_lifetime --- proposals/1763-configurable-retention-periods.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index cc44ea67fd4..9f4dd1e0b69 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -250,6 +250,11 @@ manually delete media content from the server as it expires its own local copies of messages. (This requires us to have actually implemented a media deletion API at last.) +Clients and Servers should not default to setting a `max_lifetime` when +creating rooms; instead users should only specify a `max_lifetime` when they +need it for a specific conversation. This avoids unintentionally stopping +users from using Matrix as a way to archive their conversations if they want. + ## Tradeoffs This proposal tries to keep it simple by letting the room admin mandate the From 5c2977935d7fed889ee7a2c1d98ca65d01713872 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 11 Aug 2019 13:44:18 +0100 Subject: [PATCH 22/36] incorporate review --- .../1763-configurable-retention-periods.md | 54 ++++++++++++------- 1 file changed, 34 insertions(+), 20 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 9f4dd1e0b69..a1183d16dd9 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -88,19 +88,22 @@ set). The following fields are defined in the `m.room.retention` contents: `max_lifetime`: - the maximum duration in seconds for which a server must store - this event. Must be null or in range [0, 231-1]. If absent, or null, - should be interpreted as 'forever'. + the maximum duration in seconds for which a server must store this event. + Must be null or an integer in range [0, 231-1]. If absent, or + null, should be interpreted as 'forever'. `min_lifetime`: - the minimum duration for which a server should store this event. - Must be null or in range [0, 231-1]. If absent, or null, should be - interpreted as 'forever'. + the minimum duration for which a server should store this event. Must be + null or an integer in range [0, 231-1]. If absent, or null, + should be interpreted as 'forever'. `expire_on_clients`: - a boolean for whether clients must expire messages clientside - to match the min/max lifetime and/or self_destruct semantics fields. If absent, - or null, should be interpreted as false. + a boolean for whether clients must expire messages clientside to match the + min/max lifetime fields. If absent, or null, should be interpreted as false. + The intention of this is to distinguish between rules intended to impose a + data retention policy on the server - versus rules intended to provide a + degree of privacy by requesting all data is purged from all clients after a + given time. Retention is only considered for non-state events. @@ -115,10 +118,10 @@ invariant that `max_lifetime >= min_lifetime` must be maintained by clamping max_lifetime to be equal to `min_lifetime`. If the user's retention settings conflicts with those in the room, then the -user's clients MUST warn the user when participating in the room. A conflict -exists if the user has configured their client to create rooms with retention -settings which differing from the values on the `m.room.retention` state -event. This is particularly important to warn the user if the room's +user's clients are expected to warn the user when participating in the room. +A conflict exists if the user has configured their client to create rooms with +retention settings which differing from the values on the `m.room.retention` +state event. This is particularly important to warn the user if the room's retention is longer than their default requested retention period. For instance: @@ -132,7 +135,7 @@ For instance: The above example means that servers receiving messages in this room should store the event for only 86400 seconds (1 day), as measured from that event's origin_server_ts, after which they MUST purge all references to that -event ID (e.g. from their db and any in-memory queues). +event (e.g. from their db and any in-memory queues). We consciously do not redact the event, as we are trying to eliminate metadata and save disk space at the cost of deliberately discarding older messages from @@ -154,11 +157,12 @@ Server admins have two ways of influencing message retention on their server: 1) Specifying a default `m.room.retention` for rooms created on the server, as defined as a per-server implementation configuration option which inserts the -state events after creating the room (effectively augmenting the presets used -when creating a room). If a server admin is trying to conserve diskspace, they -may do so by specifying and enforcing a relatively low min_lifetime (e.g. 1 -month), but not specify a max_lifetime, in the hope that other servers will -retain the data for longer. +state events after creating the room, and before `initial_state` is applied on +`/createRoom` (effectively augmenting the presets used when creating a room). +If a server admin is trying to conserve diskspace, they may do so by +specifying and enforcing a relatively low min_lifetime (e.g. 1 month), but not +specify a max_lifetime, in the hope that other servers will retain the data +for longer. XXX: is this the correct approach to take? It's how we force E2E encryption on, but it feels very fragmentory to have magical presets which do different @@ -199,7 +203,7 @@ history being irrevocably lost against the senders' wishes. To summarise, servers and clients must implement the pruning algorithm as follows: -If we're a client, apply the algorithm if: +If we're a client (including bots and bridges), apply the algorithm: * if specified, the `expire_on_clients` field in the `m.room.retention` event for the room is true. * otherwise, don't apply the algorithm. @@ -231,6 +235,16 @@ It's worth noting that this means events may sometimes disappear from event streams; calling the same `/sync` or `/messages` API twice may give different results if some of the events have disappeared in the interim. +In order to retain the integrity of the DAG for the room on the server, events +which form forward extremities for a room should not be purged but redacted. + + XXX: is this sufficient? Should we keep a heuristic of the number of + redacted events which hang around, just in case some lost server reappears + from a netsplit and tries referencing older events? Perhaps we can check + the other servers in the room to ensure that we don't purge events their + forward extremities refer to (except this won't work if the other servers + have netsplit) + If possible, servers/clients should remove downstream notifications of a message once it has expired (e.g. by cancelling push notifications). From 032e63b5c33aaa2bb871ced56a6f2d9fee2052a5 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 11 Aug 2019 13:52:59 +0100 Subject: [PATCH 23/36] Apply suggestions from code review Co-Authored-By: Travis Ralston --- proposals/1763-configurable-retention-periods.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index a1183d16dd9..5c6b7b6da77 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -134,7 +134,7 @@ For instance: The above example means that servers receiving messages in this room should store the event for only 86400 seconds (1 day), as measured from that -event's origin_server_ts, after which they MUST purge all references to that +event's `origin_server_ts`, after which they MUST purge all references to that event (e.g. from their db and any in-memory queues). We consciously do not redact the event, as we are trying to eliminate metadata @@ -148,7 +148,7 @@ the DAG. ``` The above example means that servers receiving this message SHOULD store the -event forever, but MAY choose to purge their copy after 28 days (or longer) in +event forever, but can choose to purge their copy after 28 days (or longer) in order to reclaim diskspace. ### Server Admin-specified per-room retention @@ -217,13 +217,13 @@ The minimum lifetime of an event is calculated as: * for clients, `min_lifetime` should be considered to be 0 (as there is no requirement for clients to persist events). -If the calculated max_lifetime is less than the min_lifetime then the max_lifetime -is set to be equal to the min_lifetime. +If the calculated `max_lifetime` is less than the `min_lifetime` then the `max_lifetime` +is set to be equal to the `min_lifetime`. The server/client then selects a lifetime of the event to lie between the calculated values of minimum and maximum lifetime, based on their implementation -and configuration requirements. The selected lifetime MUST not exceed the -calculated maximum lifetime. The selected lifetime SHOULD not be less than the +and configuration requirements. The selected lifetime MUST NOT exceed the +calculated maximum lifetime. The selected lifetime SHOULD NOT be less than the calculated minimum lifetime, but may be less in case of constrained resources, in which case the server should prioritise retaining locally generated events over remote generated events. From 1a4101ec8b56cf517942186d7667c081dd57ddb7 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 11 Aug 2019 13:53:18 +0100 Subject: [PATCH 24/36] link #2228 --- proposals/1763-configurable-retention-periods.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 5c6b7b6da77..6457f352979 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -14,9 +14,10 @@ As well as enforcing privacy requirements, these rules provide a way for server administrators to better manage disk space (e.g. to enforce rules such as "don't store remote events for public rooms for more than a month"). -This proposal originally tried to also define semantics for per-message retention -as well as per-room; this has been split out into MSCxxxx in order to get the -easier per-room semantics landed. +This proposal originally tried to also define semantics for per-message +retention as well as per-room; this has been split out into +[MSC2228](https://github.com/matrix-org/matrix-doc/pull/2228) in order to get +the easier per-room semantics landed. ## Problem: From 90b17d68e166fc1d085d437ebe6df07cb45349eb Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 11 Aug 2019 18:14:48 +0100 Subject: [PATCH 25/36] units --- proposals/1763-configurable-retention-periods.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 6457f352979..13d6922d565 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -94,9 +94,9 @@ The following fields are defined in the `m.room.retention` contents: null, should be interpreted as 'forever'. `min_lifetime`: - the minimum duration for which a server should store this event. Must be - null or an integer in range [0, 231-1]. If absent, or null, - should be interpreted as 'forever'. + the minimum duration in seconds for which a server should store this event. + Must be null or an integer in range [0, 231-1]. If absent, or + null, should be interpreted as 'forever'. `expire_on_clients`: a boolean for whether clients must expire messages clientside to match the From 32f21ac469dbfee74cd1475e9bf01ed74a380998 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Fri, 16 Aug 2019 16:38:16 +0100 Subject: [PATCH 26/36] lifetimes in milliseconds --- proposals/1763-configurable-retention-periods.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 13d6922d565..24ab5922053 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -89,13 +89,13 @@ set). The following fields are defined in the `m.room.retention` contents: `max_lifetime`: - the maximum duration in seconds for which a server must store this event. - Must be null or an integer in range [0, 231-1]. If absent, or + the maximum duration in milliseconds for which a server must store this event. + Must be null or an integer in range [0, 263-1]. If absent, or null, should be interpreted as 'forever'. `min_lifetime`: - the minimum duration in seconds for which a server should store this event. - Must be null or an integer in range [0, 231-1]. If absent, or + the minimum duration in milliseconds for which a server should store this event. + Must be null or an integer in range [0, 263-1]. If absent, or null, should be interpreted as 'forever'. `expire_on_clients`: @@ -129,7 +129,7 @@ For instance: ```json { - "max_lifetime": 86400, + "max_lifetime": 86400000, } ``` @@ -144,7 +144,7 @@ the DAG. ```json { - "min_lifetime": 2419200, + "min_lifetime": 2419200000, } ``` From a1b8726322f842f8e4ad00bff140feb9225c4730 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sat, 17 Aug 2019 01:29:10 +0100 Subject: [PATCH 27/36] fix json number ranges --- proposals/1763-configurable-retention-periods.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 24ab5922053..84372d60ab4 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -90,12 +90,12 @@ The following fields are defined in the `m.room.retention` contents: `max_lifetime`: the maximum duration in milliseconds for which a server must store this event. - Must be null or an integer in range [0, 263-1]. If absent, or + Must be null or an integer in range [0, 253-1]. If absent, or null, should be interpreted as 'forever'. `min_lifetime`: the minimum duration in milliseconds for which a server should store this event. - Must be null or an integer in range [0, 263-1]. If absent, or + Must be null or an integer in range [0, 253-1]. If absent, or null, should be interpreted as 'forever'. `expire_on_clients`: From ee0a7ee6d37065336505f2f3373c44127e515b15 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> Date: Mon, 19 Aug 2019 12:08:29 +0100 Subject: [PATCH 28/36] Update 1763-configurable-retention-periods.md fix heading --- proposals/1763-configurable-retention-periods.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 84372d60ab4..62c490a922e 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -199,7 +199,7 @@ Server admins could also override the requested retention limits (e.g. if resource constrained), but this isn't recommended given it may result in history being irrevocably lost against the senders' wishes. -## Pruning algorithm +## Pruning algorithm To summarise, servers and clients must implement the pruning algorithm as follows: From cabef485e0cc629b8e016404060c907eb2030737 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Mon, 26 Aug 2019 13:47:27 +0100 Subject: [PATCH 29/36] Apply suggestions from code review Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> --- proposals/1763-configurable-retention-periods.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 62c490a922e..e72dfc35543 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -89,12 +89,12 @@ set). The following fields are defined in the `m.room.retention` contents: `max_lifetime`: - the maximum duration in milliseconds for which a server must store this event. + the maximum duration in milliseconds for which a server must store events in this room. Must be null or an integer in range [0, 253-1]. If absent, or null, should be interpreted as 'forever'. `min_lifetime`: - the minimum duration in milliseconds for which a server should store this event. + the minimum duration in milliseconds for which a server should store events in this room. Must be null or an integer in range [0, 253-1]. If absent, or null, should be interpreted as 'forever'. From f5c3729742d3c59aa37523a5a7096beb0a56b61e Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Mon, 26 Aug 2019 13:47:31 +0100 Subject: [PATCH 30/36] incorporate review --- .../1763-configurable-retention-periods.md | 37 ++++++++----------- 1 file changed, 15 insertions(+), 22 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index e72dfc35543..7b7d993e617 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -122,8 +122,8 @@ If the user's retention settings conflicts with those in the room, then the user's clients are expected to warn the user when participating in the room. A conflict exists if the user has configured their client to create rooms with retention settings which differing from the values on the `m.room.retention` -state event. This is particularly important to warn the user if the room's -retention is longer than their default requested retention period. +state event. This is particularly important in order to warn the user if the +room's retention is longer than their default requested retention period. For instance: @@ -202,18 +202,18 @@ history being irrevocably lost against the senders' wishes. ## Pruning algorithm To summarise, servers and clients must implement the pruning algorithm as -follows: +follows. For each event `E` in the room: If we're a client (including bots and bridges), apply the algorithm: - * if specified, the `expire_on_clients` field in the `m.room.retention` event for the room is true. + * if specified, the `expire_on_clients` field in the `m.room.retention` event for the room (as of `E`) is true. * otherwise, don't apply the algorithm. The maximum lifetime of an event is calculated as: - * if specified, the `max_lifetime` field in the `m.room.retention` event for the room. + * if specified, the `max_lifetime` field in the `m.room.retention` event (as of `E`) for the room. * otherwise, the message's maximum lifetime is considered 'forever'. The minimum lifetime of an event is calculated as: - * if specified, the `min_lifetime` field in the `m.room.retention` event for the room. + * if specified, the `min_lifetime` field in the `m.room.retention` event (as of `E`) for the room. * otherwise, the message's minimum lifetime is considered 'forever'. * for clients, `min_lifetime` should be considered to be 0 (as there is no requirement for clients to persist events). @@ -262,8 +262,8 @@ retention-limited rooms for things like stickers). For encrypted rooms, there is (currently) no alternative than have the client manually delete media content from the server as it expires its own local -copies of messages. (This requires us to have actually implemented a media -deletion API at last.) +copies of messages. (This requires us to have actually implemented a [media +deletion API](https://github.com/matrix-org/matrix-doc/issues/790) at last.) Clients and Servers should not default to setting a `max_lifetime` when creating rooms; instead users should only specify a `max_lifetime` when they @@ -278,12 +278,6 @@ between the client and its server to determine the viable retention for a room. Or we could have the servers negotiate together to decide the retention for a room. Both seem overengineered, however. -This proposal deliberately doesn't address GDPR erasure or mega-redaction scenarios, -as it attempts to build a coherent UX around the use case of users knowing their -privacy requirements *at the point they send messages*. Meanwhile GDPR erasure is -handled elsewhere (and involves hiding rather than purging messages, in order to -avoid annhilating conversation history), and mega-redaction is yet to be defined. - It also doesn't solve specifying storage quotas per room (i.e. "store the last 500 messages in this room"), to avoid scope creep. This can be handled by an MSC for configuring resource quotas per room (or per user) in general. @@ -291,14 +285,13 @@ MSC for configuring resource quotas per room (or per user) in general. It also doesn't solve per-message retention behaviour - this has been split out into a seperate MSC. -## Issues - -Should room retention be announced in a room per-server? The advantage is full -flexibility in terms of servers announcing their different policies for a room -(and possibly letting users know how likely history is to be retained, or conversely -letting servers know if they need to step up to retain history). The disadvantage -is that it could make for very complex UX for end-users: "Warning, some servers in -this room have overridden history retention to conflict with your preferences" etc. +We don't announce room retention settings within a room per-server. The +advantage would be full flexibility in terms of servers announcing their +different policies for a room (and possibly letting users know how likely +history is to be retained, or conversely letting servers know if they need to +step up to retain history). The disadvantage is that it could make for very +complex UX for end-users: "Warning, some servers in this room have overridden +history retention to conflict with your preferences" etc. ## Security considerations From f8ceb9725ab60beb56ce92bab86b9b88f6ec1128 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Mon, 26 Aug 2019 13:49:02 +0100 Subject: [PATCH 31/36] spell out an example UI for warning about retention --- proposals/1763-configurable-retention-periods.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 7b7d993e617..b56ba5ed946 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -125,6 +125,9 @@ retention settings which differing from the values on the `m.room.retention` state event. This is particularly important in order to warn the user if the room's retention is longer than their default requested retention period. +The UI for this could be a warning banner in the room to remind the user that +that room's retention setting doesn't match their preferred default. + For instance: ```json From 8b1a0c3b58b7731ccd40ced7d75b818176d3803e Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Wed, 28 Aug 2019 12:42:49 +0100 Subject: [PATCH 32/36] clarify care & feeding of DAG --- .../1763-configurable-retention-periods.md | 22 +++++++++++-------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index b56ba5ed946..441e1cf522c 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -239,15 +239,19 @@ It's worth noting that this means events may sometimes disappear from event streams; calling the same `/sync` or `/messages` API twice may give different results if some of the events have disappeared in the interim. -In order to retain the integrity of the DAG for the room on the server, events -which form forward extremities for a room should not be purged but redacted. - - XXX: is this sufficient? Should we keep a heuristic of the number of - redacted events which hang around, just in case some lost server reappears - from a netsplit and tries referencing older events? Perhaps we can check - the other servers in the room to ensure that we don't purge events their - forward extremities refer to (except this won't work if the other servers - have netsplit) +A room must have at least one forward extremity in order to allow new events +to be sent within it. Therefore servers must redact rather than purge obsolete +events which are forward extremities in order to avoid wedging the room. + +It's impossible to back-paginate past a hole in the DAG, such as one caused by +pruning events. This is considered a feature for this MSC, given the point of +pruning is to discard prior history. However, it's important that +implementations handle this failure mode gracefully. This is left up to the +implementation; one approach could be to discard the backwards extremities +caused by a purge, or otherwise mark them as unpaginatable. There is a +[separate related bug](https://github.com/matrix-org/matrix-doc/issues/2251) +that the CS API does not currently provide a well-defined way to say when +/messages has hit a hole in the DAG and cannot paginate further. If possible, servers/clients should remove downstream notifications of a message once it has expired (e.g. by cancelling push notifications). From 9357ec68a7a259f74776e958b392a1c811d29ec0 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Wed, 28 Aug 2019 22:54:41 +0100 Subject: [PATCH 33/36] incorporate more @richvdh review --- .../1763-configurable-retention-periods.md | 45 ++++++++++--------- 1 file changed, 25 insertions(+), 20 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 441e1cf522c..e663afa57d5 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -166,14 +166,8 @@ state events after creating the room, and before `initial_state` is applied on If a server admin is trying to conserve diskspace, they may do so by specifying and enforcing a relatively low min_lifetime (e.g. 1 month), but not specify a max_lifetime, in the hope that other servers will retain the data -for longer. - - XXX: is this the correct approach to take? It's how we force E2E encryption - on, but it feels very fragmentory to have magical presets which do different - things depending on which server you're on. The alternative would be some - kind of federation-aware negotiation where a server refuses to participate in - a room unless it gets its way on retention settings, however this feels - unnecessarily draconian. +for longer. This is not recommended however, as it harms users who want to +use Matrix like e-mail, as a permenant archive of their conversations. 2) By adjusting how aggressively their server enforces the the `min_lifetime` value for message retention within a room. For instance, a server admin could @@ -243,15 +237,16 @@ A room must have at least one forward extremity in order to allow new events to be sent within it. Therefore servers must redact rather than purge obsolete events which are forward extremities in order to avoid wedging the room. -It's impossible to back-paginate past a hole in the DAG, such as one caused by -pruning events. This is considered a feature for this MSC, given the point of -pruning is to discard prior history. However, it's important that -implementations handle this failure mode gracefully. This is left up to the -implementation; one approach could be to discard the backwards extremities +Server implementations must ensure that clients cannot back-paginate into a +region of the event graph which has been purged (bearing in mind that other +servers may or may not give a successful response to requests to backfill such +events). One approach to this could be to discard the backwards extremities caused by a purge, or otherwise mark them as unpaginatable. There is a -[separate related bug](https://github.com/matrix-org/matrix-doc/issues/2251) -that the CS API does not currently provide a well-defined way to say when -/messages has hit a hole in the DAG and cannot paginate further. +separate related [spec +bug](https://github.com/matrix-org/matrix-doc/issues/2251) and [impl +bug](https://github.com/matrix-org/synapse/issues/1623) that the CS API does +not currently provide a well-defined way to say when /messages has hit a hole +in the DAG or the start of the room and cannot paginate further. If possible, servers/clients should remove downstream notifications of a message once it has expired (e.g. by cancelling push notifications). @@ -272,10 +267,11 @@ manually delete media content from the server as it expires its own local copies of messages. (This requires us to have actually implemented a [media deletion API](https://github.com/matrix-org/matrix-doc/issues/790) at last.) -Clients and Servers should not default to setting a `max_lifetime` when -creating rooms; instead users should only specify a `max_lifetime` when they -need it for a specific conversation. This avoids unintentionally stopping -users from using Matrix as a way to archive their conversations if they want. +Clients and Servers are recommended to not default to setting a `max_lifetime` +when creating rooms; instead users should only specify a `max_lifetime` when +they need it for a specific conversation. This avoids unintentionally +stopping users from using Matrix as a way to archive their conversations if +they so desire. ## Tradeoffs @@ -300,6 +296,15 @@ step up to retain history). The disadvantage is that it could make for very complex UX for end-users: "Warning, some servers in this room have overridden history retention to conflict with your preferences" etc. +We let servers specify a default `m.room.retention` for rooms created on their +servers as a coarse way to encourage users to not suck up disk space (although +it's not recommended). This is also how we force E2E encryption on, but it +feels quite fragmentory to have magical presets which do different things +depending on which server you're on. The alternative would be some kind of +federation-aware negotiation where a server refuses to participate in a room +unless it gets its way on retention settings, however this feels unnecessarily +draconian and complex. + ## Security considerations It's always a gentlemen's agreement for servers and clients alike to actually From ac2f87e30d83820bd9e5d5bdd723a605654adda0 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Tue, 3 Sep 2019 14:39:32 +0100 Subject: [PATCH 34/36] Apply suggestions from code review Co-Authored-By: Travis Ralston --- proposals/1763-configurable-retention-periods.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index e663afa57d5..66a5349009e 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -228,7 +228,7 @@ over remote generated events. Server/clients then set a maintenance task to remove ("purge") old events and references to their IDs from their DB and in-memory queues after the lifetime -has expired (starting timing from the absolute origin_server_ts on the event). +has expired (starting timing from the absolute `origin_server_ts` on the event). It's worth noting that this means events may sometimes disappear from event streams; calling the same `/sync` or `/messages` API twice may give different results if some of the events have disappeared in the interim. @@ -245,7 +245,7 @@ caused by a purge, or otherwise mark them as unpaginatable. There is a separate related [spec bug](https://github.com/matrix-org/matrix-doc/issues/2251) and [impl bug](https://github.com/matrix-org/synapse/issues/1623) that the CS API does -not currently provide a well-defined way to say when /messages has hit a hole +not currently provide a well-defined way to say when `/messages` has hit a hole in the DAG or the start of the room and cannot paginate further. If possible, servers/clients should remove downstream notifications of a message From 116c5b91821b1327c0ed2c1550ea34a4ddf83204 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Tue, 3 Sep 2019 19:24:37 +0100 Subject: [PATCH 35/36] split out media attachment clean-up to #2278 --- proposals/1763-configurable-retention-periods.md | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 66a5349009e..5b7014078b6 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -256,16 +256,8 @@ up to the server implementation's configuration on whether to allow it or not, and if allowed, configure how long the backfill should persist before being purged again. -Media uploads must also be expired in line with the retention policy of the -room. For unencrypted rooms this is easy; when the event that references a -piece of content is expired, the content must be expired too - assuming the -content was first uploaded in that room. (This allows for content reuse in -retention-limited rooms for things like stickers). - -For encrypted rooms, there is (currently) no alternative than have the client -manually delete media content from the server as it expires its own local -copies of messages. (This requires us to have actually implemented a [media -deletion API](https://github.com/matrix-org/matrix-doc/issues/790) at last.) +Cleaning up the media attachments of expired or redacted events has been +split out into https://github.com/matrix-org/matrix-doc/issues/2278. Clients and Servers are recommended to not default to setting a `max_lifetime` when creating rooms; instead users should only specify a `max_lifetime` when From f8090875bbb8f6392ec9e6aced22125c9abf1209 Mon Sep 17 00:00:00 2001 From: Brendan Abolivier Date: Tue, 11 Oct 2022 18:53:47 +0100 Subject: [PATCH 36/36] Massively rewrite the proposal In order to better match the existing implementation, and to specify missing components that are needed for a working end-to-end implementation. --- .../1763-configurable-retention-periods.md | 537 ++++++++++-------- 1 file changed, 305 insertions(+), 232 deletions(-) diff --git a/proposals/1763-configurable-retention-periods.md b/proposals/1763-configurable-retention-periods.md index 5b7014078b6..804e25f556d 100644 --- a/proposals/1763-configurable-retention-periods.md +++ b/proposals/1763-configurable-retention-periods.md @@ -1,13 +1,12 @@ # Proposal for specifying configurable per-room message retention periods. -A major shortcoming of Matrix has been the inability to specify how long -events should stored by the servers and clients which participate in a given -room. +A major shortcoming of Matrix has been the inability to specify how long events +should stored by the servers and clients which participate in a given room. This proposal aims to specify a simple yet flexible set of rules which allow -users, room admins and server admins to determine how long data should be -stored for a room, from the perspective of respecting the privacy requirements -of that room (which may range from a "burn after reading" ephemeral conversation, +users, room admins and server admins to determine how long data should be stored +for a room, from the perspective of respecting the privacy requirements of that +room (which may range from a "burn after reading" ephemeral conversation, through to FOIA-style public record keeping requirements). As well as enforcing privacy requirements, these rules provide a way for server @@ -19,41 +18,28 @@ retention as well as per-room; this has been split out into [MSC2228](https://github.com/matrix-org/matrix-doc/pull/2228) in order to get the easier per-room semantics landed. -## Problem: + +## Problem Matrix is inherently a protocol for storing and synchronising conversation history, and various parties may wish to control how long that history is stored for. - * Users may wish to specify a maximum age for their messages for privacy - purposes, for instance: - * to avoid their messages (or message metadata) being profiled by - unscrupulous or compromised homeservers - * to avoid their messages in public rooms staying indefinitely on the public - record - * because of legal/corporate requirements to store message history for a - limited period of time - * because of legal/corporate requirements to store messages forever - (e.g. FOIA) - * to provide "ephemeral messaging" semantics where messages are best-effort - deleted after being read. - * Room admins may wish to specify a retention policy for all messages in a - room. - * A room admin may wish to enforce a lower or upper bound on message - retention on behalf of its users, overriding their preferences. - * A bridged room should be able to enforce the data retention policies of the - remote rooms. - * Server admins may wish to specify a retention policy for their copy of given - rooms, in order to manage disk space. - -Additionally, we would like to provide this behaviour whilst also ensuring that -users generally see a consistent view of message history, without lots of gaps -and one-sided conversations where messages have been automatically removed. - -At the least, it should be possible for people participating in a conversation -to know the expected lifetime of the other messages in the conversation **at -the time they are sent** in order to know how best to interact with them (i.e. -whether they are knowingly participating in a ephemeral conversation or not). +Room administrators, for instance, may wish to control how long a message can be +stored (e.g. to comply with corporate/legal requirements to store message +history for at least a specific amount of time), or how early a message can be +deleted (e.g. to address privacy concerns of the room's members, to avoid +messages staying in the public record forever, or to comply with corporate/legal +requirements to only store specific kinds of information for a limited amount of +time). + +Additionally, server administrators may also wish to control how long message +history is kept in order to better manage their server's disk space, or to +enforce corporate/legal requirements for the organisation managing the server. + +We would like to provide this behaviour whilst also ensuring that users +generally see a consistent view of message history, without lots of gaps and +one-sided conversations where messages have been automatically removed. We would also like to set the expectation that rooms typically have a long message retention - allowing those who wish to use Matrix to act as an archive @@ -77,67 +63,42 @@ This proposal does not try to solve the problems of: purge arbitrary events from the DB without fracturing the DAG of the room, and so a different approach is required) + ## Proposal -### Room Admin-specified per-room retention +### Per-room retention -We introduce a `m.room.retention` state event, which room admins can set to -mandate the history retention behaviour for a given room. It follows the -default PL semantics for a state event (requiring PL of 50 by default to be -set). +We introduce a `m.room.retention` state event, which room admins or moderators +can set to mandate the history retention behaviour for a given room. It follows +the default PL semantics for a state event (requiring PL of 50 by default to be +set). Its state key is an empty string (`""`). The following fields are defined in the `m.room.retention` contents: -`max_lifetime`: - the maximum duration in milliseconds for which a server must store events in this room. - Must be null or an integer in range [0, 253-1]. If absent, or - null, should be interpreted as 'forever'. - -`min_lifetime`: - the minimum duration in milliseconds for which a server should store events in this room. - Must be null or an integer in range [0, 253-1]. If absent, or - null, should be interpreted as 'forever'. - -`expire_on_clients`: - a boolean for whether clients must expire messages clientside to match the - min/max lifetime fields. If absent, or null, should be interpreted as false. - The intention of this is to distinguish between rules intended to impose a - data retention policy on the server - versus rules intended to provide a - degree of privacy by requesting all data is purged from all clients after a - given time. - -Retention is only considered for non-state events. - -If set, these fields SHOULD replace other retention behaviour configured by -the user or server admin - even if it means forcing laxer privacy requirements -on that user. This is a conscious privacy tradeoff to allow admins to specify -explicit privacy requirements for a room. For instance, a room may explicitly -require all messages in the room be stored forever with `min_lifetime: null`. - -In the instance of `min_lifetime` or `max_lifetime` being overridden, the -invariant that `max_lifetime >= min_lifetime` must be maintained by clamping -max_lifetime to be equal to `min_lifetime`. - -If the user's retention settings conflicts with those in the room, then the -user's clients are expected to warn the user when participating in the room. -A conflict exists if the user has configured their client to create rooms with -retention settings which differing from the values on the `m.room.retention` -state event. This is particularly important in order to warn the user if the -room's retention is longer than their default requested retention period. - -The UI for this could be a warning banner in the room to remind the user that -that room's retention setting doesn't match their preferred default. +* `max_lifetime`: the maximum duration in milliseconds for which a server must + store events in this room. Must be null or an integer in range [0, + 253-1]. If absent or null, should be interpreted as not setting an + upper bound to the room's retention policy. + +* `min_lifetime`: the minimum duration in milliseconds for which a server should + store events in this room. Must be null or an integer in range [0, + 253-1]. If absent or null, should be interpreted as not setting a + lower bound to the room's retention policy. + +In the instance of both `max_lifetime` and `min_lifetime` being provided, +`max_lifetime` must always be higher or equal to `min_lifetime`. + For instance: ```json { - "max_lifetime": 86400000, + "max_lifetime": 86400000 } ``` The above example means that servers receiving messages in this room should -store the event for only 86400 seconds (1 day), as measured from that +store the event for only 86400000 milliseconds (1 day), as measured from that event's `origin_server_ts`, after which they MUST purge all references to that event (e.g. from their db and any in-memory queues). @@ -147,7 +108,7 @@ the DAG. ```json { - "min_lifetime": 2419200000, + "min_lifetime": 2419200000 } ``` @@ -155,161 +116,273 @@ The above example means that servers receiving this message SHOULD store the event forever, but can choose to purge their copy after 28 days (or longer) in order to reclaim diskspace. -### Server Admin-specified per-room retention - -Server admins have two ways of influencing message retention on their server: - -1) Specifying a default `m.room.retention` for rooms created on the server, as -defined as a per-server implementation configuration option which inserts the -state events after creating the room, and before `initial_state` is applied on -`/createRoom` (effectively augmenting the presets used when creating a room). -If a server admin is trying to conserve diskspace, they may do so by -specifying and enforcing a relatively low min_lifetime (e.g. 1 month), but not -specify a max_lifetime, in the hope that other servers will retain the data -for longer. This is not recommended however, as it harms users who want to -use Matrix like e-mail, as a permenant archive of their conversations. - -2) By adjusting how aggressively their server enforces the the `min_lifetime` -value for message retention within a room. For instance, a server admin could -configure their server to attempt to automatically purge remote messages in -public rooms which are older than three months (unless min_lifetime for those -messages was set higher). - -A possible implementation-specific server configuration here could be -something like: - * target_lifetime_public_remote_events: 3 months - * target_lifetime_public_local_events: null # forever - * target_lifetime_private_remote_events: null # forever - * target_lifetime_private_local_events: null # forever - -...which would try to automatically purge remote events from public rooms after -3 months (assuming their individual min_lifetime is not higher), but leave -others alone. - -These config values would interact with the min_lifetime and max_lifetime -values in the different classes of room by decreasing the effective -max_lifetime to the proposed value (whilst preserving the `max_lifetime >= -min_lifetime` invariant). However, the precise behaviour would be up to the -server implementation. - -Server admins could also override the requested retention limits (e.g. if -resource constrained), but this isn't recommended given it may result in -history being irrevocably lost against the senders' wishes. - -## Pruning algorithm - -To summarise, servers and clients must implement the pruning algorithm as -follows. For each event `E` in the room: - -If we're a client (including bots and bridges), apply the algorithm: - * if specified, the `expire_on_clients` field in the `m.room.retention` event for the room (as of `E`) is true. - * otherwise, don't apply the algorithm. - -The maximum lifetime of an event is calculated as: - * if specified, the `max_lifetime` field in the `m.room.retention` event (as of `E`) for the room. - * otherwise, the message's maximum lifetime is considered 'forever'. - -The minimum lifetime of an event is calculated as: - * if specified, the `min_lifetime` field in the `m.room.retention` event (as of `E`) for the room. - * otherwise, the message's minimum lifetime is considered 'forever'. - * for clients, `min_lifetime` should be considered to be 0 (as there is no - requirement for clients to persist events). - -If the calculated `max_lifetime` is less than the `min_lifetime` then the `max_lifetime` -is set to be equal to the `min_lifetime`. - -The server/client then selects a lifetime of the event to lie between the -calculated values of minimum and maximum lifetime, based on their implementation -and configuration requirements. The selected lifetime MUST NOT exceed the -calculated maximum lifetime. The selected lifetime SHOULD NOT be less than the -calculated minimum lifetime, but may be less in case of constrained resources, -in which case the server should prioritise retaining locally generated events -over remote generated events. - -Server/clients then set a maintenance task to remove ("purge") old events and -references to their IDs from their DB and in-memory queues after the lifetime -has expired (starting timing from the absolute `origin_server_ts` on the event). -It's worth noting that this means events may sometimes disappear from event -streams; calling the same `/sync` or `/messages` API twice may give different -results if some of the events have disappeared in the interim. - -A room must have at least one forward extremity in order to allow new events -to be sent within it. Therefore servers must redact rather than purge obsolete -events which are forward extremities in order to avoid wedging the room. - -Server implementations must ensure that clients cannot back-paginate into a -region of the event graph which has been purged (bearing in mind that other -servers may or may not give a successful response to requests to backfill such -events). One approach to this could be to discard the backwards extremities -caused by a purge, or otherwise mark them as unpaginatable. There is a -separate related [spec -bug](https://github.com/matrix-org/matrix-doc/issues/2251) and [impl -bug](https://github.com/matrix-org/synapse/issues/1623) that the CS API does -not currently provide a well-defined way to say when `/messages` has hit a hole -in the DAG or the start of the room and cannot paginate further. - -If possible, servers/clients should remove downstream notifications of a message -once it has expired (e.g. by cancelling push notifications). - -If a user tries to re-backfill in history which has already been purged, it's -up to the server implementation's configuration on whether to allow it or not, -and if allowed, configure how long the backfill should persist before being -purged again. - -Cleaning up the media attachments of expired or redacted events has been -split out into https://github.com/matrix-org/matrix-doc/issues/2278. - -Clients and Servers are recommended to not default to setting a `max_lifetime` -when creating rooms; instead users should only specify a `max_lifetime` when -they need it for a specific conversation. This avoids unintentionally -stopping users from using Matrix as a way to archive their conversations if -they so desire. +```json +{ + "min_lifetime": 2419200000, + "max_lifetime": 15778800000 +} +``` + +The above example means that servers SHOULD store their copy of the event for at least 28 +days after it has been sent, and MUST delete it at the latest after 6 months. + + +## Server-defined retention + +Server administrators can benefit from a few capabilities to control how long +history is stored: + +* the ability to set a default retention policy for rooms that don't have a + retention policy defined in their state +* the ability to override the retention policy for a room +* the ability to cap the effective `max_lifetime` and `min_lifetime` of the rooms the + server is in + +The implementation of these capabilities in the server is left as an +implementation detail. + +We introduce the following authenticated endpoint to allow clients to enquire +about how the server implements this policy: + + +``` +GET /_matrix/client/v3/retention/configuration +``` + +200 response properties: + +* `policies` (required): An object mapping room IDs to a retention policy. If + the room ID is `*`, the associated policy is the default policy. Each policy + follows the format for the content of an `m.room.retention` state event. +* `limits` (required): An object defining the limits to apply to policies + defined by `m.room.retention` state events. This object has two optional + properties, `min_lifetime` and `max_lifetime`, which each define a limit to + the equivalent property of the state events' content. Each limit defines an + optional `min` (the minimum value, in milliseconds) and an optional `max` (the + maximum value, in milliseconds). + +If both `policies` and `limits` are included in the response, the policies +specified in `policies` __must__ comply with the limits defined in `limits`. + +Example response: + +```json +{ + "policies": { + "*": { + "max_lifetime": 15778800000 + }, + "!someroom:test": { + "min_lifetime": 2419200000, + "max_lifetime": 15778800000 + } + }, + "limits": { + "min_lifetime": { + "min": 86400000, + "max": 172800000 + }, + "max_lifetime": { + "min": 7889400000, + "max": 15778800000 + } + } +} +``` + +In this example, the server is configured with: + +* a default policy with a `max_lifetime` of 6 months and no `min_lifetime` (i.e. messages + can only be kept up to 6 months after they have been sent) +* an override for the retention policy in room `!someroom:test` +* limits on `min_lifetime` that + +Example response with no policy or limit set: + +```json +{ + "policies": {}, + "limits": {} +} +``` + +Example response with only a default policy and an upper limit on `max_lifetime`: + +```json +{ + "policies": { + "*": { + "min_lifetime": 86400000, + "max_lifetime": 15778800000 + } + }, + "limits": { + "max_lifetime": { + "max": 15778800000 + } + } +} +``` + +### Defining the effective retention policy of a room + +In this section, as well as in the rest of this document, we define the +"effective retention policy" of a room as the retention policy that is used to +determine whether an event should be deleted or not. This may be the policy +determined by the `m.room.retention` event in the state of the room, but it +might not be depending on limits set by the homeserver. + +The algorithm implementation must implement to determine the effective retention +policy of a room is + + +* if the homeserver defines a specific retention policy for this room, then use + this policy as the effective retention policy of the room. +* otherwise, if the state of the room does not include a `m.room.retention` + event with an empty state key: + * if the homeserver defines a default retention policy, then use this policy + as the effective retention policy of the room. + * if the homeserver does not define a default retention policy, then don't + apply a retention policy in this room. +* otherwise, if the state of the room includes a `m.room.retention` event with + an empty state key: + * if no limit is set by the homeserver use the policy in the state of the + room as the effective retention policy of the room. + * for `min_lifetime` and `max_lifetime`: + * if there is no limit for the property, use the value specified in the + room's state for the effective retention policy of the room (if any). + * if there is a limit for the property: + * if the value specified in the room's state complies with the + limit, use this value for the effective retention policy of the + room. + * if the value specified in the room's state is lower than the + limit's `min` value, use the `min` value for the effective + retention policy of the room. + * if the value specified in the room's state is greater than the + limit's `max` value, use the `max` value for the effective + retention policy of the room. + * if there is no value specified in the room's state, use the + limit's `min` value for the effective retention policy of the + room (which can be null or absent). +* otherwise, don't apply a retention policy in this room. + +So, for example, if a homeserver defines a lower limit on `max_lifetime` of +`86400000` (a day) and no limit on `min_lifetime`, and a room's retention policy +is the following: + +```json +{ + "max_lifetime": 43200000, + "min_lifetime": 21600000 +} +``` + +Then the effective retention policy of the room is: + +```json +{ + "max_lifetime": 86400000, + "min_lifetime": 21600000 +} +``` + + +## Enforcing a retention policy + +Retention is only considered for non-state events. Retention is also not +considered for the most recent event in a room, in order to allow a new event +sent to that room to reference it in its `prev_events`. + +When purging events in a room, only the latest retention policy state event in +that room is considered. This means that in a room where the history looks like +the following (oldest event first): + +1. Retention policy A +2. Event 1 +3. Event 2 +4. Retention policy B + +Then the retention policy B is used to determine the effective retention that +defines whether events 1 and 2 should be purged, even though they were sent when +the retention policy A was in effect. This is to avoid creating wholes in the +room's DAG caused by events in the middle of the timeline being subject to a +lower `max_lifetime` than other events being sent before and after them. Such +holes would make it more difficult for homeservers to calculate room timelines +when showing them to clients. They would also force clients to display +potentially incomplete or one-sided conversations without being able to easily +tell which parts of the conversation is missing. + +Servers decide whether an event should or should not be purged by calculating +how much time has passed since the event's `origin_server_ts` property, and +comparing this duration with the room's effective retention policy. + +Note that, for performance reasons, a server might decide to not purge an event +the second it hits the end of its lifetime (e.g. so it can batch several events +together). In this case, the server must make sure to omit the expired events +from reponses to client requests. Similarly, if the server is sent an expired +event over federation, it must omit it from responses to client requests (and +ensure it is eventually purged). ## Tradeoffs -This proposal tries to keep it simple by letting the room admin mandate the -retention behaviour for a room. However, we could alternatively have a negotiation -between the client and its server to determine the viable retention for a room. -Or we could have the servers negotiate together to decide the retention for a room. -Both seem overengineered, however. - -It also doesn't solve specifying storage quotas per room (i.e. "store the last -500 messages in this room"), to avoid scope creep. This can be handled by an -MSC for configuring resource quotas per room (or per user) in general. - -It also doesn't solve per-message retention behaviour - this has been split out -into a seperate MSC. - -We don't announce room retention settings within a room per-server. The -advantage would be full flexibility in terms of servers announcing their -different policies for a room (and possibly letting users know how likely -history is to be retained, or conversely letting servers know if they need to -step up to retain history). The disadvantage is that it could make for very -complex UX for end-users: "Warning, some servers in this room have overridden -history retention to conflict with your preferences" etc. - -We let servers specify a default `m.room.retention` for rooms created on their -servers as a coarse way to encourage users to not suck up disk space (although -it's not recommended). This is also how we force E2E encryption on, but it -feels quite fragmentory to have magical presets which do different things -depending on which server you're on. The alternative would be some kind of -federation-aware negotiation where a server refuses to participate in a room -unless it gets its way on retention settings, however this feels unnecessarily -draconian and complex. +This proposal specifies that the lifetime of an event is defined by the latest +retention policy in the room, rather than the one in effect when the event was +sent. This might be controversial as, in Matrix, the state that an event is +subject to is usually the state of the room at the time it was sent. However, +there are a few issues with using the retention that was in effect at the time +the event was sent: + +* it would create holes in the DAG of a room which would complexify the + server-side handling of the room's history +* malicious servers could potentially make an event evade retention policies by + selecting their event's `prev_events` and `auth_events` so that the event is + on a portion of the DAG where the policy does not exist +* it would be difficult to translate the configuration of retention policies + into a clear and easy to use UX (especially considering server-side + configuration applies to the whole history of the room) +* it would not allow room administrators to retroactively update the lifetime of + events that have already been sent (e.g. if the context of a room administered + by an organisation which requirements for data retention change over time) + +This proposal does not cover per-message retention (i.e. the ability to set +different lifetimes to different messages). This has been split out into +[MSC2228](https://github.com/matrix-org/matrix-spec-proposals/pull/2228) to +simplify this proposal. + +This proposal does also not cover the case where a room's administrator wishes +to only restrict the lifetime of a specific section of the room's history. This +is left to be covered by a separate MSC, possibly built on top of MSC2228. ## Security considerations -It's always a gentlemen's agreement for servers and clients alike to actually -uphold the requested retention behaviour; users should never rely on deletion -actually having happened. +In a context of open federation, it is worth keeping in mind the possibility +that not all servers in a room will enforce its retention policy. Similarly, +different servers will likely enforce different server-side configuration, and +as a result calculate different lifetimes for a given event. This proposal aims +at trying to compromise between finding an absolute consensus on an event's +lifetime and working within the constraints of a server's operator in terms of +data retention. + +In a kind of contradictory way with the previous paragraph, a server may keep an +expired event in its database for some time after its expiration, while not +sharing it with clients and federating servers. This is in order to prevent +abusers from using low lifetime values in a room's retention policy in order to +erase any proof of such abuse and avoid being investigated. + +Basing the expiration time of an event on its `origin_server_ts` is not ideal as +this field can be falsified by the sending server. However, there currently +isn't a more reliable way to certify the send time of an event. + +As mentioned previously in this proposal, servers might store expired events for +longer than their lifetime allows, either for performance reason or to mitigate +abuse. This is considered acceptable as long as: + +* an expired event is not kept permanently +* an expired event is not shared with clients and federated servers -## Conclusion +## Unstable prefixes -Previous attempts to solve this have got stuck by trying to combine together too many -disparate problems (e.g. reclaiming diskspace; aiding user data privacy; self-destructing -messages; mega-redaction; clearing history on specific devices; etc) - see -https://github.com/matrix-org/matrix-doc/issues/440 and https://github.com/matrix-org/matrix-doc/issues/447 -for the history. +While this proposal is under review, the `m.room.retention` event type should be +replaced by the `org.matrix.msc1763.retention` type. -This proposal attempts to simplify things to strictly considering the question of -how long servers (and clients) should persist events for. +Similarly, the `/_matrix/client/v3/retention/configuration` path should be replaced with `/_matrix/client/unstable/org.matrix.msc1763/retention/configuration`.