Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: batch or partition DB access and commits for user streak updation #3296

Merged
merged 9 commits into from
Jun 7, 2023

Conversation

melvinebenezer
Copy link
Collaborator

@melvinebenezer melvinebenezer commented Jun 5, 2023

fixes #3293

  • Partitioned the access to list of users and updated the same
  • Added more debug
  • Tested on a small batch of Fake users

What is the best USER_STREAK_BATCH_SIZE ?

@andreaskoepf
Copy link
Collaborator

andreaskoepf commented Jun 5, 2023

I think the current implementation could be implemented with two update statements (which would be more efficient than the loops):

# reset streak_days to 0 for all users with more than one day inactivity
update "user" set streak_days=0, streak_last_day_date=current_timestamp::date 
where (current_timestamp-last_activity_date) > interval '1 day' or streak_days is null or last_activity_date is null;

# increment streak_days when user completed tasks on consecutive days 
update "user" set streak_days=streak_days+1, streak_last_day_date=current_timestamp::date 
where current_timestamp::date != streak_last_day_date::date;

I think there was a logic error in the current code which causes the streak_days count to become too low after a some time. After incrementing it sets the streak_last_day_date = current_time .. but in this case it is clear that (current_time-streak_last_day_date) > 24h .. that means it takes >24h before each streak increment .. the effect depends on the update interval .. but let's say it is 4 hours than in the worst case it could mean a streak is incremented only all 28h ..

IMO one big problem with the current logic is: It always updates the streak_last_day_date of all users .. especially of all inactive users... maybe the whole streak-update mechanisms should be redesigned?

@melvinebenezer
Copy link
Collaborator Author

To summarise the changes

  1. Update the user streak when there is user activity
  2. Periodic celery task to reset streak info in case of inactivity

@andreaskoepf andreaskoepf self-requested a review June 7, 2023 05:40
Copy link
Collaborator

@andreaskoepf andreaskoepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now to me!

@andreaskoepf andreaskoepf merged commit aebf3fa into main Jun 7, 2023
@andreaskoepf andreaskoepf deleted the user_streak_opt branch June 7, 2023 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

user streak updates in the backend takes up 100% cpu
2 participants