-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter users before iterating for notification #59
Conversation
In MySQL 5.7, in a correlated subquery (which is when a subquery references columns selected only in the outer query), correlated columns can only be used in the WHERE statement - the JOIN statement as I had assumed was possible. This is only actually documented in MySQL 8.0 but seems to be the case here too. Using a JOIN would be more performant if it were possible, as this new approach introduces two new subqueries that are themselves already inside a subquery.
Test runs on my local machine using a dataset from a couple of months ago and the new dry run mode on the hourly channel (203 users, 4 of whom would be notified):
The first test run took longer than the references; however, the filtering process only removed 51 users, leaving 157 to be iterated. I would have expected it to remove close to all but the 4 who would actually have been notified. It's also worth noting that my local machine would have identical performance metrics for both the notifier and database processes. In production, the database process performance is bottlenecked by the size of its server (#57), so I should expect worse performance than I'm seeing here. I should see if I can restrict the performance of the database container on my local machine for more indicative tests. (That being said, it might also be worth noting that the notifier process would normally also be responsible for actually sending the notification, which would take a little while. So if this test was based on accelerated notifer and accelerated database, perhaps it cancels out a little bit.) |
In the list query I'd removed the
This is much better, and still accurate. 198 users are removed leaving only 10 to be iterated. I'm still not totally sure why there are an extra 6 users in the list, but it's not enough that I'm concerned. The query is still very inefficient. I can try to make it faster by removing filters - e.g. the subquery that removes users who are manually unsubscribed. It's just important that I don't remove anything that adds users to the query. I'm going to try removing the unsubscribe subquery and see if there's a net gain in the runtime, even if it means more users will be iterated.
No significant change from removing the unsubscription subquery - granted, with this particular dataset, it only adds a single user to the list. Going to deploy this and see what happens. |
Resolves #58.