Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter: make date-based bounds inclusive instead of exclusive #683

Closed
huddlej opened this issue Feb 16, 2021 · 6 comments · Fixed by #708
Closed

filter: make date-based bounds inclusive instead of exclusive #683

huddlej opened this issue Feb 16, 2021 · 6 comments · Fixed by #708
Assignees
Labels
bug Something isn't working easy problem Requires less work than most issues good first issue A relatively isolated issue appropriate for first-time contributors priority: moderate To be resolved after high priority issues

Comments

@huddlej
Copy link
Contributor

huddlej commented Feb 16, 2021

Current Behavior

Filtering by min or max date excludes records whose date exactly matches the given bounds.

Expected behavior

The "max date" filter should include records up to and including that latest date. The "min date" filter should behave the same way, for consistency.

Possible solution

  • Add a unit test to capture the desired behavior for this edge case.
  • Change the test operators for min and max date to inclusive operators (< to <= for max date).
  • Update help text for the --min-date and --max-date flags to clearly document that these dates are inclusive

Additional context

This issue arose as during analysis in the ncov workflow.

The proposed solution will require a new major release of Augur, since this change affects the public command line interface.

@huddlej huddlej added bug Something isn't working good first issue A relatively isolated issue appropriate for first-time contributors priority: moderate To be resolved after high priority issues easy problem Requires less work than most issues labels Feb 16, 2021
@rneher
Copy link
Member

rneher commented Feb 16, 2021

I guess the python solution would be to use >=args.min_date and <args.max_date?

For arguments like args.max_date='2020-12-23 we have an inherent accuracy limit of 1 day and we end up running into time zone issues. The ncov pipeline would yield different results in NZ and in Seattle... not sure we want to get into that.

@huddlej
Copy link
Contributor Author

huddlej commented Feb 16, 2021

We already use datetime.today() in our ncov workflows, so I don't think this would change our time zone issues.

This change doesn't really cost us anything, but we get a big usability win for the use case James described. Most of our builds rely on datetime.today() to set upper bounds and it seems like we should include sequences from that given day rather than exclude them.

An even friendly interface would be if filter supported some common human-readable date values or offsets. For example:

augur filter \
  --min-date last month \
  --max-date today

@tsibley
Copy link
Member

tsibley commented Feb 16, 2021

Unless we're truncating from timestamps to dates, I would expect dates to be fixed regardless of TZ. If they're not, I'd consider that a bug. This can happen if dates are treated as timestamps (often at midnight), which they're not. TZ-less dates are how all datetime handling libraries I've seen work (including Python's datetime.date class) and how ISO 8601 treats them.

Regarding date ranges, we could consider a different argument and syntax so as 1) to leave the choice of inclusiveness to the user and 2) to not affect backwards compat. For example:

--date-range='[2020-12-01, 2020-01-01)'
--date-range='[2020-12-01, 2020-12-31]'
--date-range='[2020-01-01,)'
--date-range='(,2021-01-01)'

I agree with John that even a few relative specifiers would be nice and are not exclusive with any other changes to date filters.

@rneher
Copy link
Member

rneher commented Feb 16, 2021

agreed on all counts. Dates should be fixed regardless of time zone. But there is always going to be some ambiguity. Since we aren't keeping track of TZ, the result of max-date=today might differ even when the analyses are run at the exact same time depending on your TZ.

that aside, most people probably expect the behavior: [2020-12-01, 2020-12-31]

and if this is explicit in the help that is probably the best quick solution.

@tsibley
Copy link
Member

tsibley commented Feb 16, 2021

Agreed re: best quick solution and expected behaviour.

@huddlej
Copy link
Contributor Author

huddlej commented Mar 30, 2021

@benjaminotter This issue's description reflects the consensus outcome from the conversation above that we should initially use inclusive dates and document these clearly in the help.

There are a couple of other proposals in the conversation about that may be interesting for you to consider, but none of these have been formally discussed or refined beyond this conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working easy problem Requires less work than most issues good first issue A relatively isolated issue appropriate for first-time contributors priority: moderate To be resolved after high priority issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants