Filter users before iterating for notification #59

rossjrw · 2023-04-20T02:06:54Z

Resolves #58.

In MySQL 5.7, in a correlated subquery (which is when a subquery references columns selected only in the outer query), correlated columns can only be used in the WHERE statement - the JOIN statement as I had assumed was possible. This is only actually documented in MySQL 8.0 but seems to be the case here too. Using a JOIN would be more performant if it were possible, as this new approach introduces two new subqueries that are themselves already inside a subquery.

rossjrw · 2023-04-20T16:40:49Z

Test runs on my local machine using a dataset from a couple of months ago and the new dry run mode on the hourly channel (203 users, 4 of whom would be notified):

main @ `1035a19`	this branch @ `f4a3f69`
3:37.07 (4)	4:13.85 (4)
3:41.16 (4)
3:40.00 (4)

The first test run took longer than the references; however, the filtering process only removed 51 users, leaving 157 to be iterated. I would have expected it to remove close to all but the 4 who would actually have been notified.

It's also worth noting that my local machine would have identical performance metrics for both the notifier and database processes. In production, the database process performance is bottlenecked by the size of its server (#57), so I should expect worse performance than I'm seeing here. I should see if I can restrict the performance of the database container on my local machine for more indicative tests.

(That being said, it might also be worth noting that the notifier process would normally also be responsible for actually sending the notification, which would take a little while. So if this test was based on accelerated notifer and accelerated database, perhaps it cancels out a little bit.)

rossjrw · 2023-04-20T18:02:27Z

In the list query I'd removed the post.posted_timestamp BETWEEN %(lower_timestamp)s AND %(upper_timestamp)s condition and replaced it with post.posted_timestamp >= user_last_notified.notified_timestamp, which I thought would filter out posts added since the user was last notified. However, a user's last notification timestamp is actually the timestamp of the post about which they were last notified rather than the timestamp of the notification itself. Replacing >= with > should resolve this.

main @ `1035a19`	this branch @ `b003df7`
3:37.07 (4)	2:51.77 (4)
3:41.16 (4)	2:38.40 (4)
3:40.00 (4)	2:53.77 (4)

This is much better, and still accurate. 198 users are removed leaving only 10 to be iterated. I'm still not totally sure why there are an extra 6 users in the list, but it's not enough that I'm concerned.

The query is still very inefficient. I can try to make it faster by removing filters - e.g. the subquery that removes users who are manually unsubscribed. It's just important that I don't remove anything that adds users to the query. I'm going to try removing the unsubscribe subquery and see if there's a net gain in the runtime, even if it means more users will be iterated.

main @ `1035a19`	this branch @ `b003df7`	@ 2fa6960
3:37.07 (4)	2:51.77 (4)	2:52.73 (4)
3:41.16 (4)	2:38.40 (4)	2:42.82 (4)
3:40.00 (4)	2:53.77 (4)	2:39.78 (4)

No significant change from removing the unsubscription subquery - granted, with this particular dataset, it only adds a single user to the list.

Going to deploy this and see what happens.

rossjrw · 2023-04-21T16:31:59Z

Success: the daily channel no longer exceeds the Lambda max duration.

rossjrw added 17 commits March 22, 2023 15:51

Create query for pending notifications user filter

1abf4b6

Add Black to devcontainer

37eef5b

Create template for new query accessor

d500155

Rename notifiable users query for consistency

85fc6c0

Add opt out of overwriting user configs

2911a18

Write test for notifiable users list

179d130

Correct database helper for notifiable users list

137055f

Dump test command to script

28293b0

Correct thread_sub column name

1950a52

Improve Docker cache performance for tests

68f08b1

Pass pytest args to test application

4cce602

Compress fake user generation

f91906b

Merge sub subqueries for performance

9fc8ccc

Create null thread for irrelevant user to sub to

f7a5ea7

Update MySQL version in docs

ce0dd04

Add dry option across application

d273a9a

rossjrw added the optimisation Make an existing feature faster or smaller label Apr 20, 2023

rossjrw added 2 commits April 20, 2023 03:16

Add execute instructions for Docker

60972c4

Filter users to those with notifications on DB

f4a3f69

Filter out the last post a user was notified about

b003df7

rossjrw marked this pull request as ready for review April 20, 2023 20:25

rossjrw merged commit e68be11 into main Apr 20, 2023

rossjrw deleted the list-notifiable-users branch April 20, 2023 20:25

This was referenced Aug 7, 2023

Condense list of error users by removing inactive users #76

Open

Autosubscribe users to their pages' discussions #77

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter users before iterating for notification #59

Filter users before iterating for notification #59

rossjrw commented Apr 20, 2023

rossjrw commented Apr 20, 2023 •

edited

Loading

rossjrw commented Apr 20, 2023 •

edited

Loading

rossjrw commented Apr 21, 2023

Filter users before iterating for notification #59

Filter users before iterating for notification #59

Conversation

rossjrw commented Apr 20, 2023

rossjrw commented Apr 20, 2023 • edited Loading

rossjrw commented Apr 20, 2023 • edited Loading

rossjrw commented Apr 21, 2023

rossjrw commented Apr 20, 2023 •

edited

Loading

rossjrw commented Apr 20, 2023 •

edited

Loading