Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support treetime ambiguous date format #1305

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

corneliusroemer
Copy link
Member

@corneliusroemer corneliusroemer commented Sep 4, 2023

Add support for the treetime ambiguous date format [float_min:float_max], e.g. [2004.43:2005.58]
This is useful for when ambiguity is not limited to month/year boundaries, e.g. in case of Danish SARS-CoV-2 sequences which have their dates binned rounded down to the most recent Monday.

Resolves #1304

With this PR, we can use cross-month ambiguous dates that aren't whole year, and within-month ambiguous dates as well.
E.g. this wouldn't have been possible previously without unconstraining everything but the year:
image

Testing

What steps should be taken to test the changes you've proposed?
If you added or changed behavior in the codebase, did you update the tests, or do you need help with this?

  • Added unit tests
  • Using it in production in ncov-simplest, turning Danish dates into ranges here and running refine with the new parsing feature here

Checklist

  • Add a message in CHANGES.md summarizing the changes in this PR that are end user focused. Keep headers and formatting consistent with the rest of the file.

Add support for the treetime ambiguous date format `[float_min:float_max]`, e.g. `[2004.43:2005.58]`
This is useful for when ambiguity is not limited
to month/year boundaries.
Resolves #1304
@codecov
Copy link

codecov bot commented Sep 4, 2023

Codecov Report

Patch coverage is 60.00% of modified lines.

Files Changed Coverage
augur/dates/__init__.py 60.00%

📢 Thoughts on this report? Let us know!.

@victorlin victorlin self-requested a review September 5, 2023 17:47
victorlin
victorlin previously approved these changes Sep 5, 2023
Copy link
Member

@victorlin victorlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this! It's a simple solution to something we've discussed (I forget where exactly) and seems to work well for you.

@@ -121,6 +121,12 @@ def get_numerical_date_from_value(value, fmt=None, min_max_year=None):
except InvalidDate as error:
raise AugurError(str(error)) from error
return [treetime.utils.numeric_date(d) for d in ambig_date]
if value.startswith('[') and value.endswith(']') and len(value[1:-1].split(':'))==2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For added context, this is the corresponding logic in TreeTime: treetime/utils.py#L305-L312

@@ -121,6 +121,12 @@ def get_numerical_date_from_value(value, fmt=None, min_max_year=None):
except InvalidDate as error:
raise AugurError(str(error)) from error
return [treetime.utils.numeric_date(d) for d in ambig_date]
if value.startswith('[') and value.endswith(']') and len(value[1:-1].split(':'))==2:
# Treetime ambiguous date format, e.g. [2019.5:2020.5]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how you get the numerical dates in metadata. Is this a standard format outside of TreeTime, or do you run ISO dates through something like treetime.utils.numeric_date to get the float representation?

Asking because I think a good next step after this gets merged is to generalize for any [A:B] date pair where A and B can be anything that's currently supported as an exact scalar date (e.g. [2023-07-31:2023-08-06]). I did some prototyping of this a while back but never got back to creating a proper PR for it.

try:
return [float(x) for x in value[1:-1].split(':')]
except ValueError as error:
raise AugurError(str(error)) from error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: customize the error message.

When given something like [2023-07-31:2023-08-06], the ValueError message looks something like:

could not convert string to float: '2023-07-31'

It's obvious what this means to us as developers, but maybe not so obvious to users. Something like this might be more clear:

The date range format must be `[A:B]` where both `A` and `B` are numerical dates.

@victorlin
Copy link
Member

Shifting the design decision a bit in the motivating issue. I'll plan to update this PR with that and address the conversations above.

@victorlin victorlin self-assigned this Jun 27, 2024
@victorlin victorlin marked this pull request as draft June 27, 2024 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Allow precise date ranges
2 participants