Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter users before iterating for notification #59

Merged
merged 20 commits into from
Apr 20, 2023
Merged

Conversation

rossjrw
Copy link
Member

@rossjrw rossjrw commented Apr 20, 2023

Resolves #58.

@rossjrw rossjrw added the optimisation Make an existing feature faster or smaller label Apr 20, 2023
@rossjrw
Copy link
Member Author

rossjrw commented Apr 20, 2023

Test runs on my local machine using a dataset from a couple of months ago and the new dry run mode on the hourly channel (203 users, 4 of whom would be notified):

main @ 1035a19 this branch @ f4a3f69
3:37.07 (4) 4:13.85 (4)
3:41.16 (4)
3:40.00 (4)

The first test run took longer than the references; however, the filtering process only removed 51 users, leaving 157 to be iterated. I would have expected it to remove close to all but the 4 who would actually have been notified.

It's also worth noting that my local machine would have identical performance metrics for both the notifier and database processes. In production, the database process performance is bottlenecked by the size of its server (#57), so I should expect worse performance than I'm seeing here. I should see if I can restrict the performance of the database container on my local machine for more indicative tests.

(That being said, it might also be worth noting that the notifier process would normally also be responsible for actually sending the notification, which would take a little while. So if this test was based on accelerated notifer and accelerated database, perhaps it cancels out a little bit.)

@rossjrw
Copy link
Member Author

rossjrw commented Apr 20, 2023

In the list query I'd removed the post.posted_timestamp BETWEEN %(lower_timestamp)s AND %(upper_timestamp)s condition and replaced it with post.posted_timestamp >= user_last_notified.notified_timestamp, which I thought would filter out posts added since the user was last notified. However, a user's last notification timestamp is actually the timestamp of the post about which they were last notified rather than the timestamp of the notification itself. Replacing >= with > should resolve this.

main @ 1035a19 this branch @ b003df7
3:37.07 (4) 2:51.77 (4)
3:41.16 (4) 2:38.40 (4)
3:40.00 (4) 2:53.77 (4)

This is much better, and still accurate. 198 users are removed leaving only 10 to be iterated. I'm still not totally sure why there are an extra 6 users in the list, but it's not enough that I'm concerned.

The query is still very inefficient. I can try to make it faster by removing filters - e.g. the subquery that removes users who are manually unsubscribed. It's just important that I don't remove anything that adds users to the query. I'm going to try removing the unsubscribe subquery and see if there's a net gain in the runtime, even if it means more users will be iterated.

main @ 1035a19 this branch @ b003df7 @ 2fa6960
3:37.07 (4) 2:51.77 (4) 2:52.73 (4)
3:41.16 (4) 2:38.40 (4) 2:42.82 (4)
3:40.00 (4) 2:53.77 (4) 2:39.78 (4)

No significant change from removing the unsubscription subquery - granted, with this particular dataset, it only adds a single user to the list.

Going to deploy this and see what happens.

@rossjrw rossjrw marked this pull request as ready for review April 20, 2023 20:25
@rossjrw rossjrw merged commit e68be11 into main Apr 20, 2023
@rossjrw rossjrw deleted the list-notifiable-users branch April 20, 2023 20:25
@rossjrw
Copy link
Member Author

rossjrw commented Apr 21, 2023

Success: the daily channel no longer exceeds the Lambda max duration.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimisation Make an existing feature faster or smaller
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fetch notifiable user list before iterating users
1 participant