Skip to content
This repository has been archived by the owner on Mar 5, 2023. It is now read-only.

Possible alternative to measuring perceived latency #45

Open
turt2live opened this issue Sep 11, 2018 · 1 comment
Open

Possible alternative to measuring perceived latency #45

turt2live opened this issue Sep 11, 2018 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@turt2live
Copy link
Owner

turt2live commented Sep 11, 2018

The goal of this project is to measure the perceived latency that real users would experience over matrix. Breaking this down into as many segments as possible is useful for administrators to highlight areas of the system that can be improved.

One of the major concerns with the bot currently is that it uses a ton of disk space due to the room being populated every few minutes with a new event from some bot. Over time this can also cause problems for servers that end up participating in the room but not sending anything.

This bot could be converted to be a hybrid of an appservice and bot to rely less on spamming a room and more on tracking real messages coming through the server. The appservice would have something like a .* user regex to capture all senders from all domains, thereby receiving a firehose of data from the server. Appservice transactions are easier to calculate than /sync requests, but the appservice could get a rough estimate of the time it takes to receive a particular event from a given host by making use of the origin_server_ts and time of receipt on the appservice. This would be the optimistic measurement of the perceived latency for messages.

The bot would still advertise in the common room every so often, however this could be done much less frequently. Ie: it could send 2 or 3 messages in quick succession every hour to get an accurate measurement for the /sync delay and time it takes to send a message. The events would probably be brought back down to be simple messages instead of carrying data with them. The additional timing information from this approach can be used to calculate an offset for the appservice timings over time, thereby making the optimistic value more realistic.

The bots could share the information they see via send-to-device messaging (EDUs), sharing whatever relevant timing information they see through that. This does lead to an increase in traffic (as the number of events to send effectively becomes O(N) rather than O(1)) however the events do not get persisted on the homeserver, saving space. These updates would likely happen much more frequently, possibly in the realm of 5 minutes. Timing information would include the appservice data and measured hourly data.

The bots can then interpret the full mesh from that, and produce semi-stable and reliable numbers. This approach would be good to pair with an implementation of matrix-org/matrix-spec#35 so that the appservice timing information doesn't have a period of wild inaccuracy after a restart/crash/whatever.

The best part about this whole approach is it doesn't require any matrix protocol extensions or homeserver modifications - all of this can be achieved today. Some components it would rely on are the arguable-bug https://github.com/matrix-org/matrix-doc/issues/1260 and send-to-device messaging. Something that would be good to have would be federation-formatted events in appservice transactions, however this is probably not all that controversial to implement/propose.

@turt2live turt2live added the enhancement New feature or request label Sep 11, 2018
@turt2live
Copy link
Owner Author

Related (appservices receiving events in federation format): https://github.com/matrix-org/matrix-doc/issues/1670

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant