Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Backfill causes wrong membership state to be sent to clients #11874

Closed
deepbluev7 opened this issue Feb 1, 2022 · 14 comments
Closed

Backfill causes wrong membership state to be sent to clients #11874

deepbluev7 opened this issue Feb 1, 2022 · 14 comments
Assignees
Labels
A-Sync defects related to /sync T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@deepbluev7
Copy link
Contributor

Description

grafik

Translation for non german speakers:

  • verlassen: left
  • gekickt: kicked
  • eingeladen: invited

The above state is what I got via /sync. Basically Cat sending messages caused Tomm's events to backfill. Synapse correctly calculated, that the user is still in the room and the leave event was incorrect backfill. But it still sent it to my client. My client has no way of knowing, that this is not the latest state, so it assumed the user was not in the room anymore. But I could not invite the user again. So I kicked them and then I could invite them.

This is bad for multiple reasons. It makes tracking membership for E2EE messy and some users will be unable to decrypt messages because of this. And it also in general makes the membership list of a room unreliable.

I'm not sure what the proper solution for this would be from the spec side, but I would assume you should always send the most current state event to a client instead of outdated ones, that are not in the current room state. Maybe just in state to not mess with the timeline.

Steps to reproduce

  • You need 3 servers, A can't federate with B and an independent C
  • Have A leave the room, but those events never make it to B.
  • B kicks A
  • A restarts the server
  • B invites A
  • A accepts the invite
  • A and B think A is in the room
  • C sends a message, which causes B to fetch the backfill, which includes the leave event and send it to B's clients.

Those steps are not exact, because I can't actually repro it. Network issues are hard. But something like that must have happened.

Version information

  • Homeserver: neko.dev, arcticfoxes.net, feline.support

If not matrix.org:

  • Version: 1.51

  • Install method: ebuild, debian package, docker

  • Platform: various
@richvdh
Copy link
Member

richvdh commented Feb 1, 2022

this sounds like it's basically the same as matrix-org/matrix-spec#1209, possibly with a side-helping of "pick any two: consistency, availability, partition tolerance"

It would be good to get an idea of the actual event DAG in this room, if you can dig it out of the database.

@deepbluev7
Copy link
Contributor Author

deepbluev7 commented Feb 1, 2022

Yes, this is probably matrix-org/matrix-spec#1209, but I didn't find that again. If you can tell me what queries you need, I can run them for you.

Actually, matrix-org/matrix-spec#1209 is more about state res changes not being propagated to clients. This is about events being propagated to clients, that should not affect the state due to state res.

Propagating state res changes via state in /sync is fairly trivial, no idea how to fix this issue without just straight up not sending the event or needing 2 syncs...

@richvdh
Copy link
Member

richvdh commented Feb 1, 2022

you should be able to figure out the DAG by looking at the event_edges table.

@deepbluev7
Copy link
Contributor Author

leave: $WK6jEn7ig8KrO5iRwyAofQeUTU81QWX-B_kOKlSUl68

kick: $V-38lZCzKnPpgtoKfZsFzJ8C1XpG4uHVfT0kdK8gN-Q

invite: $Y_MQVtt5-65xLmdYVWkUnaxUImPh5em8SdMmAKFdj-k

synapse=# select * from event_edges where event_id in ('$WK6jEn7ig8KrO5iRwyAofQeUTU81QWX-B_kOKlSUl68', '$V-38lZCzKnPpgtoKfZsFzJ8C1XpG4uHVfT0kdK8gN-Q', '$Y_MQVtt5-65xLmdYVWkUnaxUImPh5em8SdMmAKFdj-k');
                   event_id                   |                prev_event_id                 |              room_id               | is_state 
----------------------------------------------+----------------------------------------------+------------------------------------+----------
 $V-38lZCzKnPpgtoKfZsFzJ8C1XpG4uHVfT0kdK8gN-Q | $jvgwn1N5eWJBS0SxKw858_uXu1XvSRVL7cExPVIcpzQ | !etDMqntYVqugEClPcE:artemislena.eu | f
 $WK6jEn7ig8KrO5iRwyAofQeUTU81QWX-B_kOKlSUl68 | $OWFYdjNFSIkk7AgepXn1tGhLinhi0FxgtLKbbfSjZVE | !etDMqntYVqugEClPcE:artemislena.eu | f
 $Y_MQVtt5-65xLmdYVWkUnaxUImPh5em8SdMmAKFdj-k | $V-38lZCzKnPpgtoKfZsFzJ8C1XpG4uHVfT0kdK8gN-Q | !etDMqntYVqugEClPcE:artemislena.eu | f
(3 rows)

synapse=# select * from event_edges where prev_event_id in ('$WK6jEn7ig8KrO5iRwyAofQeUTU81QWX-B_kOKlSUl68', '$V-38lZCzKnPpgtoKfZsFzJ8C1XpG4uHVfT0kdK8gN-Q', '$Y_MQVtt5-65xLmdYVWkUnaxUImPh5em8SdMmAKFdj-k');
                   event_id                   |                prev_event_id                 |              room_id               | is_state 
----------------------------------------------+----------------------------------------------+------------------------------------+----------
 $Y_MQVtt5-65xLmdYVWkUnaxUImPh5em8SdMmAKFdj-k | $V-38lZCzKnPpgtoKfZsFzJ8C1XpG4uHVfT0kdK8gN-Q | !etDMqntYVqugEClPcE:artemislena.eu | f
 $d3_9RIXPL0Ki_Yns9SnRD3V-alXNVXv9mS7UeGcTPwM | $WK6jEn7ig8KrO5iRwyAofQeUTU81QWX-B_kOKlSUl68 | !etDMqntYVqugEClPcE:artemislena.eu | f
 $lc-P9o9z2yfdnDWR8dKSXsw7UJZOM5wsLaO96AyKpkY | $Y_MQVtt5-65xLmdYVWkUnaxUImPh5em8SdMmAKFdj-k | !etDMqntYVqugEClPcE:artemislena.eu | f
(3 rows)

Do you need any more edges?

@deepbluev7
Copy link
Contributor Author

Here is a more complete log, which starts at the top of the screenshot and goes until the invite:

synapse=# select * from event_edges where event_id in ('$xlPVArkI464uPid7qEH3adAUM6pv4KRTmq_xAWuZXTs', '$OWFYdjNFSIkk7AgepXn1tGhLinhi0FxgtLKbbfSjZVE', '$WK6jEn7ig8KrO5iRwyAofQeUTU81QWX-B_kOKlSUl68', '$d3_9RIXPL0Ki_Yns9SnRD3V-alXNVXv9mS7UeGcTPwM', '$9h41fnhWKW8PLcui76pQLu2B6dl3KW2ncaqqmZLcEYI', '$vjmPmmcDmtB1KqBgBDMX1uwSks6XQ0wEoLx2RcfOKOs', '$mV0WAiaw24fcvPZkk2UqUg1c_Ua_sz22a6uMwq0Xu4s', '$ZeH1hSQHifcaeoGANwxQausg9lTv6hMWVPyMCwBU8TU', '$0LBz2vlEVw7Bp7CjI4AaKGmZYJjpBD7tc5tiS5AaCKE', '$fNCdTdUfu98Rd625bSTWVkxiJrqvWmgxYq3T3AF3bYY', '$jvgwn1N5eWJBS0SxKw858_uXu1XvSRVL7cExPVIcpzQ', '$V-38lZCzKnPpgtoKfZsFzJ8C1XpG4uHVfT0kdK8gN-Q', '$Y_MQVtt5-65xLmdYVWkUnaxUImPh5em8SdMmAKFdj-k');
                   event_id                   |                prev_event_id                 |              room_id               | is_state 
----------------------------------------------+----------------------------------------------+------------------------------------+----------
 $0LBz2vlEVw7Bp7CjI4AaKGmZYJjpBD7tc5tiS5AaCKE | $ZeH1hSQHifcaeoGANwxQausg9lTv6hMWVPyMCwBU8TU | !etDMqntYVqugEClPcE:artemislena.eu | f
 $9h41fnhWKW8PLcui76pQLu2B6dl3KW2ncaqqmZLcEYI | $d3_9RIXPL0Ki_Yns9SnRD3V-alXNVXv9mS7UeGcTPwM | !etDMqntYVqugEClPcE:artemislena.eu | f
 $OWFYdjNFSIkk7AgepXn1tGhLinhi0FxgtLKbbfSjZVE | $yYbMjso8A5U6piG4-39LzZq600yf0Q2HrN0FostZN0g | !etDMqntYVqugEClPcE:artemislena.eu | f
 $V-38lZCzKnPpgtoKfZsFzJ8C1XpG4uHVfT0kdK8gN-Q | $jvgwn1N5eWJBS0SxKw858_uXu1XvSRVL7cExPVIcpzQ | !etDMqntYVqugEClPcE:artemislena.eu | f
 $WK6jEn7ig8KrO5iRwyAofQeUTU81QWX-B_kOKlSUl68 | $OWFYdjNFSIkk7AgepXn1tGhLinhi0FxgtLKbbfSjZVE | !etDMqntYVqugEClPcE:artemislena.eu | f
 $Y_MQVtt5-65xLmdYVWkUnaxUImPh5em8SdMmAKFdj-k | $V-38lZCzKnPpgtoKfZsFzJ8C1XpG4uHVfT0kdK8gN-Q | !etDMqntYVqugEClPcE:artemislena.eu | f
 $ZeH1hSQHifcaeoGANwxQausg9lTv6hMWVPyMCwBU8TU | $mV0WAiaw24fcvPZkk2UqUg1c_Ua_sz22a6uMwq0Xu4s | !etDMqntYVqugEClPcE:artemislena.eu | f
 $d3_9RIXPL0Ki_Yns9SnRD3V-alXNVXv9mS7UeGcTPwM | $WK6jEn7ig8KrO5iRwyAofQeUTU81QWX-B_kOKlSUl68 | !etDMqntYVqugEClPcE:artemislena.eu | f
 $d3_9RIXPL0Ki_Yns9SnRD3V-alXNVXv9mS7UeGcTPwM | $xlPVArkI464uPid7qEH3adAUM6pv4KRTmq_xAWuZXTs | !etDMqntYVqugEClPcE:artemislena.eu | f
 $fNCdTdUfu98Rd625bSTWVkxiJrqvWmgxYq3T3AF3bYY | $0LBz2vlEVw7Bp7CjI4AaKGmZYJjpBD7tc5tiS5AaCKE | !etDMqntYVqugEClPcE:artemislena.eu | f
 $jvgwn1N5eWJBS0SxKw858_uXu1XvSRVL7cExPVIcpzQ | $fNCdTdUfu98Rd625bSTWVkxiJrqvWmgxYq3T3AF3bYY | !etDMqntYVqugEClPcE:artemislena.eu | f
 $mV0WAiaw24fcvPZkk2UqUg1c_Ua_sz22a6uMwq0Xu4s | $vjmPmmcDmtB1KqBgBDMX1uwSks6XQ0wEoLx2RcfOKOs | !etDMqntYVqugEClPcE:artemislena.eu | f
 $vjmPmmcDmtB1KqBgBDMX1uwSks6XQ0wEoLx2RcfOKOs | $9h41fnhWKW8PLcui76pQLu2B6dl3KW2ncaqqmZLcEYI | !etDMqntYVqugEClPcE:artemislena.eu | f
 $xlPVArkI464uPid7qEH3adAUM6pv4KRTmq_xAWuZXTs | $XNt1QNgLF50fQ7lkE1-wq2i4vBu4YS-F9KNY2alIPoQ | !etDMqntYVqugEClPcE:artemislena.eu | f
(14 rows)

synapse=# select * from event_edges where prev_event_id in ('$xlPVArkI464uPid7qEH3adAUM6pv4KRTmq_xAWuZXTs', '$OWFYdjNFSIkk7AgepXn1tGhLinhi0FxgtLKbbfSjZVE', '$WK6jEn7ig8KrO5iRwyAofQeUTU81QWX-B_kOKlSUl68', '$d3_9RIXPL0Ki_Yns9SnRD3V-alXNVXv9mS7UeGcTPwM', '$9h41fnhWKW8PLcui76pQLu2B6dl3KW2ncaqqmZLcEYI', '$vjmPmmcDmtB1KqBgBDMX1uwSks6XQ0wEoLx2RcfOKOs', '$mV0WAiaw24fcvPZkk2UqUg1c_Ua_sz22a6uMwq0Xu4s', '$ZeH1hSQHifcaeoGANwxQausg9lTv6hMWVPyMCwBU8TU', '$0LBz2vlEVw7Bp7CjI4AaKGmZYJjpBD7tc5tiS5AaCKE', '$fNCdTdUfu98Rd625bSTWVkxiJrqvWmgxYq3T3AF3bYY', '$jvgwn1N5eWJBS0SxKw858_uXu1XvSRVL7cExPVIcpzQ', '$V-38lZCzKnPpgtoKfZsFzJ8C1XpG4uHVfT0kdK8gN-Q', '$Y_MQVtt5-65xLmdYVWkUnaxUImPh5em8SdMmAKFdj-k');
                   event_id                   |                prev_event_id                 |              room_id               | is_state 
----------------------------------------------+----------------------------------------------+------------------------------------+----------
 $fNCdTdUfu98Rd625bSTWVkxiJrqvWmgxYq3T3AF3bYY | $0LBz2vlEVw7Bp7CjI4AaKGmZYJjpBD7tc5tiS5AaCKE | !etDMqntYVqugEClPcE:artemislena.eu | f
 $vjmPmmcDmtB1KqBgBDMX1uwSks6XQ0wEoLx2RcfOKOs | $9h41fnhWKW8PLcui76pQLu2B6dl3KW2ncaqqmZLcEYI | !etDMqntYVqugEClPcE:artemislena.eu | f
 $WK6jEn7ig8KrO5iRwyAofQeUTU81QWX-B_kOKlSUl68 | $OWFYdjNFSIkk7AgepXn1tGhLinhi0FxgtLKbbfSjZVE | !etDMqntYVqugEClPcE:artemislena.eu | f
 $Y_MQVtt5-65xLmdYVWkUnaxUImPh5em8SdMmAKFdj-k | $V-38lZCzKnPpgtoKfZsFzJ8C1XpG4uHVfT0kdK8gN-Q | !etDMqntYVqugEClPcE:artemislena.eu | f
 $d3_9RIXPL0Ki_Yns9SnRD3V-alXNVXv9mS7UeGcTPwM | $WK6jEn7ig8KrO5iRwyAofQeUTU81QWX-B_kOKlSUl68 | !etDMqntYVqugEClPcE:artemislena.eu | f
 $lc-P9o9z2yfdnDWR8dKSXsw7UJZOM5wsLaO96AyKpkY | $Y_MQVtt5-65xLmdYVWkUnaxUImPh5em8SdMmAKFdj-k | !etDMqntYVqugEClPcE:artemislena.eu | f
 $0LBz2vlEVw7Bp7CjI4AaKGmZYJjpBD7tc5tiS5AaCKE | $ZeH1hSQHifcaeoGANwxQausg9lTv6hMWVPyMCwBU8TU | !etDMqntYVqugEClPcE:artemislena.eu | f
 $9h41fnhWKW8PLcui76pQLu2B6dl3KW2ncaqqmZLcEYI | $d3_9RIXPL0Ki_Yns9SnRD3V-alXNVXv9mS7UeGcTPwM | !etDMqntYVqugEClPcE:artemislena.eu | f
 $jvgwn1N5eWJBS0SxKw858_uXu1XvSRVL7cExPVIcpzQ | $fNCdTdUfu98Rd625bSTWVkxiJrqvWmgxYq3T3AF3bYY | !etDMqntYVqugEClPcE:artemislena.eu | f
 $V-38lZCzKnPpgtoKfZsFzJ8C1XpG4uHVfT0kdK8gN-Q | $jvgwn1N5eWJBS0SxKw858_uXu1XvSRVL7cExPVIcpzQ | !etDMqntYVqugEClPcE:artemislena.eu | f
 $ZeH1hSQHifcaeoGANwxQausg9lTv6hMWVPyMCwBU8TU | $mV0WAiaw24fcvPZkk2UqUg1c_Ua_sz22a6uMwq0Xu4s | !etDMqntYVqugEClPcE:artemislena.eu | f
 $mV0WAiaw24fcvPZkk2UqUg1c_Ua_sz22a6uMwq0Xu4s | $vjmPmmcDmtB1KqBgBDMX1uwSks6XQ0wEoLx2RcfOKOs | !etDMqntYVqugEClPcE:artemislena.eu | f
 $d3_9RIXPL0Ki_Yns9SnRD3V-alXNVXv9mS7UeGcTPwM | $xlPVArkI464uPid7qEH3adAUM6pv4KRTmq_xAWuZXTs | !etDMqntYVqugEClPcE:artemislena.eu | f
(13 rows)

As you can see, Cat's message ($d3_9RIXPL0Ki_Yns9SnRD3V-alXNVXv9mS7UeGcTPwM) merges the 2 separate graphs, which causes the backfill thingy.

@squahtx
Copy link
Contributor

squahtx commented Feb 17, 2022

Thank you for digging out the edges. Plugging all that into graphviz:

image
(next events point to prev events)

It sounds a lot like matrix-org/matrix-spec#1209, or a variant of it? For whatever reason, client B isn't realising that A has left.

@squahtx squahtx removed their assignment Feb 17, 2022
@squahtx squahtx added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. A-Sync defects related to /sync labels Feb 17, 2022
@deepbluev7
Copy link
Contributor Author

@squahtx Kind of. The problem isn't the client realizing, that the user left, but that the user rejoined and then state events get backfilled, that include a leave event. But at this point the user is already rejoined, so the leave event doesn't actually change the room state and gets state res'ed away. But the client has no way of knowing that, it just gets sent the state event without any of that context. Frankly, that event just shouldn't be sent to clients, if it doesn't affect room state anymore.

@squahtx
Copy link
Contributor

squahtx commented Feb 17, 2022

I am very confused. Which event ID in the DAG does user A rejoin at? And which event does backfill happen at?

@ShadowJonathan
Copy link
Contributor

I believe this issue is touching a problem similar to https://github.com/matrix-org/matrix-doc/issues/3263, linking it for relevance

@deepbluev7
Copy link
Contributor Author

I am very confused. Which event ID in the DAG does user A rejoin at? And which event does backfill happen at?

That event isn't in the graph or screenshot. It was half an hour earlier, minutes after they left. That's why I needed to kick the user before inviting them. My client thought they were gone, but the server knew they weren't.

@erikjohnston
Copy link
Member

Yup, this looks to be the same route cause as matrix-org/matrix-spec#1209: the server doesn't tell clients in /sync when the new current state diverges from what you get by calculating it from the state in timeline section.

@erikjohnston
Copy link
Member

Actually, matrix-org/matrix-spec#1209 is more about state res changes not being propagated to clients. This is about events being propagated to clients, that should not affect the state due to state res.

So matrix-org/matrix-spec#1209 is describing a specific instance of a wider problem: the server sends events down /sync in the timeline section (as the server got new events!), but the client then uses that to calculate the new state of the room and comes to the wrong conclusion.

Ideally the server would separately signal what the new state is, but currently it has no way to. We could add a bodge to drop events that don't affect state, but then you might miss important info like someone did try and join, or whatever.

Closing as dupe.

@deepbluev7
Copy link
Contributor Author

(Just make sure you fix both of those issues eventually instead of just the specific one in matrix-org/matrix-spec#1209, because I would have not expected this issue from reading matrix-org/matrix-spec#1209 alone)

@erikjohnston
Copy link
Member

FTR it sounds like the work we're doing on sliding sync will fix this issue

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Sync defects related to /sync T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

5 participants