Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix infinite looping bugs with recurrent tasks around DST #639

Merged
merged 13 commits into from
Mar 5, 2023
Merged

Fix infinite looping bugs with recurrent tasks around DST #639

merged 13 commits into from
Mar 5, 2023

Conversation

GijsvandenHoven
Copy link
Contributor

@GijsvandenHoven GijsvandenHoven commented Dec 15, 2022

This fixes issue #638

it does this by

  1. detecting a forward jump in time happened after changing the time of a date in _getNextDate()
  2. Scanning the jumped-over range for matches with the time pattern.

What made (2) extra tricky is that apparently, DST is not uniformly 1 hour. That made things a little more complicated.
Right now the skipped range scanner assumes a forward jump is 1 minute to 23 hrs 59 minutes long, and does not roll over days.

This PR also fixes a test that was failing on the master branch, by disambiguating a DateTime (the ISO string represented 2 possible times due to a DST backwards shift, and luxon picked the unfavorable one).

An important thing to know about this change, is that it also fixes another, more minor bug with _getNextDate(), namely that a jumped-over time candidate is not considered. E.g. an every *:30:00'job does not activate at 2:30 if the time jumps from 1 to 3. (I would expect it to, and it now does, activate at 1:30, 3:00 and 3:30 respectively). Because the time of 3:00 represents every time of day between 2 and 3 all at once.

For example, how the original Cron handles DST:

 Special considerations exist when the clock is changed by less than 3 hours, for example at the beginning and end of daylight savings time.
       If the time has moved forwards, those jobs which would have run in the time that was skipped will be  run  soon	after  the  change.   Con-
       versely, if the time has moved backwards by less than 3 hours, those jobs that fall into the repeated time will not be re-run.

       Only  jobs  that run at a particular time (not specified as @hourly, nor with '*' in the hour or minute specifier) are affected. Jobs which
       are specified with wildcards are run based on the new time immediately.

Backwards jumps were handled just like this, but forward ones not as much.

This is however a change you probably want to be aware of as maintainers of this library, Maybe you know of use cases for node-cron where actually the opposite, current behavior is preferred?

@ncb000gt
Copy link
Member

ncb000gt commented Jan 9, 2023

@GijsvandenHoven Thanks for the patience. I'll try to take a look at the code changes tonight. This would be wonderful to land. Much appreciated!

@ncb000gt
Copy link
Member

It looks like there is a lot of looping in the code. While it seems like a lot at first glance it looks to be bounded. I just wanted to confirm with you that this is the case and it's not going to add additional load in performing the checks it's performing. If not this looks good to me.

Would you mind fixing the conflicts?

Thanks!

@GijsvandenHoven
Copy link
Contributor Author

GijsvandenHoven commented Jan 11, 2023

@ncb000gt Hey, sorry for the brief silent period.

I went back to my own code and I agree, that's amount of while loops is even making me nervous.
For the ones in _checkTimeInSkippedRange (that's all but one of them), I thought I could at least turn them into for-loops instead.

While doing this I encountered some code later in my own function, for which i'm pretty sure it would be incorrect.
Which is only a little embarassing for me.
The cases for which it would bug out were 'fortunately' only for DST jumps that do not exist currently,
(https://en.wikipedia.org/wiki/Daylight_saving_time_by_country), but this motivated me to both add tests that poke this specific function for these cases, as well as fix up the function itself because I was unhappy with it.


The new version, instead of doing all this abstract work to account for arbitrary DST jumps, first checks how spicy of a DST jump it is dealing with first, to prevent unnecessary complicated work.

This results in a 3-way branch:

Normal
Your every day DST jump falls under this. If you jump by exactly 60 minutes, starting at a 0th minute.
We just check if this skipped-over hour would be valid for a CronJob and that's all that needs to be done. Since any arbitrary minute x second combination would fall under it, and the scheduled CJ time is the same millisecond timestamp no matter what.

Australian
The people of Lord Howe's Island have 30 minute DST jumps, and they are the whole reason I made this thing deal with arbitrary DST jumps instead of just hardcoding 1 hour.
If a jump doesn't change the hour number we first check the hour for validity, and then if necessary loop over the jumped-over minutes to see if a valid time exists.

Quirky
Anything else, like non-hour jumps that change the hour number or over-an-hour jumps end up here. These do not exist in the real world...currently.

This does the same kind of work the old function used to do, constructing arrays of ranges for minutes and hours to check.
However, instead of clunky 'while' loops, filtering the other cases let me easily construct these ranges directly with Array.from expressions.
Then we just check every valid hour x minute combination to see if there was a valid time.

The commit where I changed this function is b26c884, but the diff is pretty garbled so you may want to review it locally instead. The changes are contained to the _checkTimeInSkippedRange function and its two subroutines.


I just wanted to confirm with you that this is the case and it's not going to add additional load in performing the checks it's performing

I think it should be fine, since all of this code is exclusive to the case where it has actually detected a DST forward jump beforehand (i.e. once per year per cronjob).

When it detects one of these, it will scan the skipped over range, but after this is done the _getNextDateFrom loop continues with whatever resulting date the "addition that passed over a jump" had. Meaning the range-scanning function is called at most once per year per CJ.

The 'real' overhead that I add to the checks the main loop in _getNextDateFrom is this check each iteration to see if any jumps happened after adding time to the date, which is just a bit of arithmetic. Only if this check returns true, all the other code in the PR runs.

sdTlZWX

Anyways with my overhaul of the function, there should be only one loop now instead of 3. Only if the future adds quirky DST increments will it do the 'original' magnitude of work, but even then it's reduced now.

@GijsvandenHoven
Copy link
Contributor Author

Oh also I went and added an iteration limit to the final remaining while loop, just in case. (Up to one day of backwards checking for a DST jump) 87a3ad8.
It shouldn't be possible to trigger this since the checks in _getNextDateFrom before calling this make it so that the #iterations is bounded by the amount of minutes skipped by the jump (usually 60), but if an arbitrary date ends up there, I figured it'd be good to prevent a buggy call from checking backwards up to a year.

@GijsvandenHoven
Copy link
Contributor Author

@ncb000gt Sorry to poke you, I understand that you may be busy with other projects.

I was wondering if you had a chance to review these last changes. I think you will agree the changes from my (overly long) previous reply improved the PR, and the unit tests I've added give me confidence it will handle any unusual forward jump.

Please let me know if there's anything I can do to help move this process forward. I'm looking forward to your feedback and appreciate your time and attention.

@intcreator
Copy link
Collaborator

hey @GijsvandenHoven I took a preliminary look at these and I think it's headed in the right direction. I want to take a bit more time to make sure I understand it though. thanks for your work on it so far!

lib/time.js Outdated Show resolved Hide resolved
lib/time.js Outdated Show resolved Hide resolved
lib/time.js Outdated Show resolved Hide resolved
lib/time.js Outdated Show resolved Hide resolved
lib/time.js Outdated Show resolved Hide resolved
@intcreator
Copy link
Collaborator

@GijsvandenHoven I went through the whole thing and I think I understand what's going on in most of it. thanks for all your hard work writing the new functionality, your explanations here and in the code, and the thorough tests. I left a few comments where I still had a hard time following what was going on but besides that I think this looks very good! I'm excited to merge this and hopeful that it will fix a lot of the issues people have been seeing

@GijsvandenHoven
Copy link
Contributor Author

@intcreator Thank you for the thorough review! It was beneficial, that broken code for the loop in particular was something.

I have pushed a few commits that should address everything you said. Let me know if there are any further comments.

Thanks in advance! 🚀

@intcreator
Copy link
Collaborator

changes look good! could you rebase again? then I think we're ready to merge @GijsvandenHoven

@GijsvandenHoven
Copy link
Contributor Author

Should be rebased now. Thank you for the helpful reviews and see you next mission! @intcreator

@intcreator
Copy link
Collaborator

hmm, looks like some merge conflicts need to be resolved now? you'll probably have to force push the rebased branch once you're done if you haven't already:
image

…nside skipped hours/minutes due to DST shifting time forward.

This is done by detecting a forward shift of time, then scanning the skipped range for candidate firing times.

A jumped over job due to DST still fires exactly once, as it should. This firing happens immediately upon shifting time. This also fixes another, less-intrusive bug where a skipped-over time period in DST is not recognized as a firing candidate. (e.g. before: 1:30 -> 3:30, after: 1:30 -> 3:00 -> 3:30 given a 2AM to 3AM forward shift)
…nside skipped hours/minutes due to DST shifting time forward.

This is done by detecting a forward shift of time, then scanning the skipped range for candidate firing times.

A jumped over job due to DST still fires exactly once, as it should. This firing happens immediately upon shifting time. This also fixes another, less-intrusive bug where a skipped-over time period in DST is not recognized as a firing candidate. (e.g. before: 1:30 -> 3:30, after: 1:30 -> 3:00 -> 3:30 given a 2AM to 3AM forward shift)
Both to have less while loops and fix what looked like a bug. The bug was undiscovered because it would happen for DST jumps that do not exist in the real world.
This includes presumed jumps that empirically will not happen. These should still be tested because the functions are implemented in such a way that they would presumably still handle them.
…rward DST jump.

It subtracted hours in the loop for no reason. And there was no loop variable.
…rJumpingPoint' for `_checkTimeInSkippedRange`.
@GijsvandenHoven
Copy link
Contributor Author

You are right, I did not push the right buttons. I synced my pull request branch with my forked master branch at first. Which had no conflicts.

This master branch had to be rebased on node-cron, of course! And now there were conflicts 🦀 🦀 . Fortunately just package.json though. Then rebasing my pull request branch again had a few more surprise conflicts, but it should be all good now. Very thankful for GUI merge tools today.

I also triple checked that no code went missing during this operation, and I couldn't find any. Nevertheless I backed up this branch before the rebasing, so if something seems off I can always bring it back.

@intcreator
Copy link
Collaborator

perfect! merging now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants