Suggestions for MLS state machine #303

richardhuaaa · 2023-11-01T22:36:48Z

Will try to keep these suggestion PR's small, for this one:

Serialize welcome messages to post_commit_data field on intent rather than separate table
Remove some group sync steps (we can handle conflicts anyway)
EDIT: Drop pending proposals for now
Describe how payloads can be synced in a non-duplicative way

Other things worth considering:

Making some of the steps generic rather than specific to intent type (a lot of it is duplicative/identical)
Whether to make commit processing strictly sequential (commit must be fully done before next one begins) or concurrent. If we make it strictly sequential, we need to think about how to handle concurrent syncs (as the application also needs to sync incoming messages, either via pull or subscription)

…ncing

…ate-machine-suggestions

richardhuaaa · 2023-11-01T23:08:22Z

AFAIK, MLS lets you define the logic for which proposals are accepted or rejected. Another possibility could be to mandate that each client will only commit its own proposals. OpenMLS has a remove_pending_proposal method that we could use instead of committing pending proposals

xmtp_mls/README.md

neekolas · 2023-11-02T00:38:03Z

xmtp_mls/README.md

+1. Sequentially process each payload. For each payload, update the `last_message_timestamp_ns` and any corresponding database writes for that payload in a single transaction
+1. When the sync is complete, release the lock by setting `lock_until_ns` to 0
+
+This flow will be similar regardless of if the sync happens via a poll-based or subscription-based mechanism. For a subscription-based mechanism, the lock will be obtained at the start of the subscription, and extended on a heartbeat time interval, until the subscription is closed.


In the absence of really robust subscriptions (which I think is going to be quite hard to implement for V1) my recommendation would be to not adjust any locks on the subscription and allow state syncs to happen in parallel.

If subscriptions are not robust, I feel like it causes a lot of downstream problems.

If a commit message is lost, subsequent payloads are unreadable. You can trigger a pull at this point, but it's added complexity.

If we don't lock syncs while subscriptions are in progress, then we have concurrent processing of the same payloads, meaning one side or the other will fail. It becomes unclear if failures are due to bad payloads/missing epoch messages, or if they are due to parallel processing.

It's not the best dev experience for consumers of the SDK.

I have some suggestions:

We could make subscriptions more robust. We could pass the last_synced_payload_timestamp to the server when initiating subscriptions. Alternatively, we could have the server record the previous payload timestamp on each payload, so that the client can detect when it is missing payloads and re-sync.

We could omit subscriptions from the initial MLS milestones and fix them up in later milestones.

We could omit subscriptions from the initial MLS milestones and fix them up in later milestones.

We can talk to developers and see how much of a deal-breaker that would be. They'd also have to be OK with no push notifications. Together, that sounds like a tough sell, but I could be wrong.

We could make subscriptions more robust. We could pass the last_synced_payload_timestamp to the server when initiating subscriptions. Alternatively, we could have the server record the previous payload timestamp on each payload, so that the client can detect when it is missing payloads and re-sync.

Even if we had the most robust streaming APIs, we also need to somehow deal with push notifications which are even less reliable and have less flexibility. The last_synced_payload_timestamp isn't really an option for push. Having the server record the previous payload timestamp on each payload would be possible in our StreamAllMessages API which powers push. Would require some changes to our back-end to include it in the message that gets gossiped in Waku. It also requires a lookup on each publish, which is probably an acceptable performance cost.

All options feel like some variation of: "some streamed messages are just a reminder to query the topic and sync from the last timestamp of our contiguous message history". There are degrees of this.

Every subscription message or push notification just tells the client to sync via the query API. We don't even look at the payloads.

Application messages can be processed directly from a stream/push notification if you are in the correct epoch. We inspect the payload and if it is a Commit/Proposal we abort processing the streamed message (and roll back the database changes that deletes the decryption keys) and sync from our last known timestamp via the query API

Any streamed message or push notification where the previous_payload_timestamp lines up with our local state gets processed immediately. If we haven't processed the message with the previous_payload_timestamp we do a sync from the network from the last synced payload timestamp.

The third option likely gives us the fewest queries to the network depending on the % of out-of-order messages received via streaming. But it is a bit of a pain to roll out. We'd need to not just update our nodes and Envelope proto schema but also the example-notification-server. And anyone using MLS would have to update their notification receiving code to handle the new field. Maybe it's worth the cost, but we should take the cost into consideration.

I do like the third option. I hadn't considered adding the previous_payload_timestamp to all our streamed envelopes.

Given that all options look like option 1 in some cases, I wonder if we start with that to unblock everything and give us streaming methods quite easily. Then we can layer on option 2 or 3 depending on how we feel about the scope of changes.

That's a really good framing, thanks for writing it up. Okay, this makes sense to me!

xmtp_mls/README.md

neekolas · 2023-11-02T17:34:41Z

OpenMLS has a remove_pending_proposal method that we could use instead of committing pending proposals

That would allow us to check the box of "supporting proposals" in the short term, without actually having to support them. We'd have to think through what it would look like to change this later. Maybe having a single member of the group on a newer version that didn't remove the proposals, but instead created a commit, would be fine. Would just require a bit of investigation.

neekolas · 2023-11-02T17:43:40Z

Whether to make commit processing strictly sequential (commit must be fully done before next one begins) or concurrent. If we make it strictly sequential, we need to think about how to handle concurrent syncs (as the application also needs to sync incoming messages, either via pull or subscription)

The simplest is to always make processing sequential. Gives us the same guarantees no matter where the message came from. Push notifications and Subscriptions that contain a commit message would just trigger a sync using the Query API that would ensure all previous commits were applied first. Application messages received via Push or Subscription are probably safe to attempt processing immediately (although they may fail if you had somehow missed a prior commit). Failed processing could be remedied with a sync-by-query.

It feels like anything else breaks the total ordering assumption baked into MLS.

We are going to need to design a solution to the total ordering problem for decentralization, but we've already decided to table that for a later iteration of the protocol.

richardhuaaa · 2023-11-02T20:32:40Z

The simplest is to always make processing sequential. Gives us the same guarantees no matter where the message came from. Push notifications and Subscriptions that contain a commit message would just trigger a sync using the Query API that would ensure all previous commits were applied first.

I think I understand - we can't process the same payload twice, so is the suggestion here to have an application sync trigger commit processing first? But what happens if you have an application sync in progress at the time that you make a commit, or halfway through the commit process?

neekolas · 2023-11-02T20:50:12Z

But what happens if you have an application sync in progress at the time that you make a commit, or halfway through the commit process?

We could have some sort of locking mechanism that prevents this. These updates should be relatively fast and simultaneous syncs should be infrequent. But I don't think we need that. If a message fails to decrypt because it's already been processed and the keys have been discarded, we just move on to the next message. As long as both processes are going through messages sequentially and talking to the same DB, everything should just work right?

The system is going to need to be very resilient to decryption failures since anyone can publish junk to the group topic, plus the fact that your own messages are unreadable.

richardhuaaa · 2023-11-02T23:20:16Z

This was a really helpful discussion. I'll try to succinctly describe the various edge cases we discussed, at least so that I understand how to implement the thing, but with that in mind I think I'm happy with this!

richardhuaaa and others added 8 commits November 1, 2023 15:29

Update identity table to reflect main

ad3c0ac

Update missing items

fe91f20

Update schema

0aa0d52

Add GROUP_MESSAGE_KIND

752b525

Rename field

b1afb26

Update state machine

d2669fa

Add support for committing pending proposals, describe group state sy…

041f3f4

…ncing

Merge remote-tracking branch 'origin/nmolnar/mls-readme' into rich/st…

1723a6d

…ate-machine-suggestions

richardhuaaa requested review from neekolas, Bren2010 and insipx November 1, 2023 22:36

richardhuaaa requested a review from jhaaaa as a code owner November 1, 2023 22:36

neekolas reviewed Nov 2, 2023

View reviewed changes

xmtp_mls/README.md Show resolved Hide resolved

neekolas reviewed Nov 2, 2023

View reviewed changes

xmtp_mls/README.md Outdated Show resolved Hide resolved

neekolas reviewed Nov 2, 2023

View reviewed changes

xmtp_mls/README.md Show resolved Hide resolved

neekolas reviewed Nov 2, 2023

View reviewed changes

xmtp_mls/README.md Outdated Show resolved Hide resolved

richardhuaaa added 2 commits November 2, 2023 12:14

Discard pending proposals

f4af7ce

Missed a spot

6dcbf59

neekolas approved these changes Nov 2, 2023

View reviewed changes

richardhuaaa merged commit 79c5f3e into nmolnar/mls-readme Nov 2, 2023
16 checks passed

richardhuaaa deleted the rich/state-machine-suggestions branch November 2, 2023 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions for MLS state machine #303

Suggestions for MLS state machine #303

richardhuaaa commented Nov 1, 2023 •

edited

Loading

richardhuaaa commented Nov 1, 2023 •

edited

Loading

neekolas Nov 2, 2023

richardhuaaa Nov 2, 2023

neekolas Nov 2, 2023 •

edited

Loading

neekolas Nov 2, 2023

richardhuaaa Nov 2, 2023

neekolas commented Nov 2, 2023

neekolas commented Nov 2, 2023

richardhuaaa commented Nov 2, 2023 •

edited

Loading

neekolas commented Nov 2, 2023 •

edited

Loading

richardhuaaa commented Nov 2, 2023

Suggestions for MLS state machine #303

Suggestions for MLS state machine #303

Conversation

richardhuaaa commented Nov 1, 2023 • edited Loading

richardhuaaa commented Nov 1, 2023 • edited Loading

neekolas Nov 2, 2023

Choose a reason for hiding this comment

richardhuaaa Nov 2, 2023

Choose a reason for hiding this comment

neekolas Nov 2, 2023 • edited Loading

Choose a reason for hiding this comment

neekolas Nov 2, 2023

Choose a reason for hiding this comment

richardhuaaa Nov 2, 2023

Choose a reason for hiding this comment

neekolas commented Nov 2, 2023

neekolas commented Nov 2, 2023

richardhuaaa commented Nov 2, 2023 • edited Loading

neekolas commented Nov 2, 2023 • edited Loading

richardhuaaa commented Nov 2, 2023

richardhuaaa commented Nov 1, 2023 •

edited

Loading

richardhuaaa commented Nov 1, 2023 •

edited

Loading

neekolas Nov 2, 2023 •

edited

Loading

richardhuaaa commented Nov 2, 2023 •

edited

Loading

neekolas commented Nov 2, 2023 •

edited

Loading