Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

When the max number of to_device messages for a sync is reached, the next sync does not return immediately #11457

Closed
BillCarsonFr opened this issue Nov 30, 2021 · 8 comments
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. z-p2 (Deprecated Label)

Comments

@BillCarsonFr
Copy link
Member

It appears that when you have reached the maximum number of to_device per sync, the next call to sync does not return immediatly with the remaining to_devices.
There must be new incoming data in order for the sync to trigger and return.

Expected:
If there are pending to_devices to deliver the sync should return immediatly instead of waiting for timeout or new data?

@babolivier babolivier added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. S-Minor Blocks non-critical functionality, workarounds exist. labels Nov 30, 2021
@babolivier
Copy link
Contributor

For extra context, we currently limit to 100 to_device messages in a sync. We return immediately (with the remaining messages) when we receive new data (i.e. data that's been created since the previous sync returned) but if we have leftover to_device messages in the database we don't count it as new data and don't return them until either new data comes in or we reach the 30s timeout.

If we have leftover to_device messages we couldn't fit in the previous sync, we definitely should return the following sync immediately with them.

@bmarty
Copy link

bmarty commented Dec 1, 2021

We want the Matrix protocol to have good performance, and so the Matrix clients, so I would rise up the severity of this issue, S-Minor is not enough to me.
Some workaround have been implemented on some clients (JS and iOS), but not on Android. I will create an issue there to implement the workaround: set the timeout parameter of sync request to 0 until the number of toDevice event is 0.

@richvdh
Copy link
Member

richvdh commented Dec 2, 2021

It seems counterproductive for clients to have to spend time implementing workarounds for server-side bugs. @callahad: can we fit this in?

@callahad callahad added P2 S-Major Major functionality / product severely impaired, no satisfactory workaround. and removed S-Minor Blocks non-critical functionality, workarounds exist. labels Dec 2, 2021
@callahad
Copy link
Contributor

callahad commented Dec 2, 2021

Bumping severity + priority per discussion above. @DMRobertson may be able to look at this once he finishes up work on another sync issue.

@BillCarsonFr
Copy link
Member Author

BillCarsonFr commented Dec 14, 2021

Slightly off-topic, but currently clients will do a catchup sync until the to_device array is empty/null. During this period client won't upload new otk (because their might be pre key olm message pending and we don't want to rotate keys to early).

The thing is that client might be doing that endlessly if there are always new to_device events.

I wonder if we don't need a limited field for to device..

@richvdh richvdh changed the title When the max number of to_device for a sync is reached, the next sync does not return immediatly When the max number of to_device for a sync is reached, the next sync does not return immediately Jan 20, 2022
@erikjohnston
Copy link
Member

I was only able to reproduce this in unit tests on SQLite. @BillCarsonFr were you experiencing this on servers using postgres?

@erikjohnston
Copy link
Member

So looks like the bug I fixed in #11966 is a recent regression and hasn't made it to a release yet.

Can someone confirm if this issue is still present?

@richvdh richvdh changed the title When the max number of to_device for a sync is reached, the next sync does not return immediately When the max number of to_device messages for a sync is reached, the next sync does not return immediately May 4, 2022
@richvdh
Copy link
Member

richvdh commented May 4, 2022

Can someone confirm if this issue is still present?

given nobody is reporting this any more, I'm going to assume it is now resolved. We can reopen if anyone can still reproduce.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

8 participants