Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine: --year-bounds has no effect #1136

Closed
victorlin opened this issue Jan 25, 2023 · 7 comments · Fixed by #1142
Closed

refine: --year-bounds has no effect #1136

victorlin opened this issue Jan 25, 2023 · 7 comments · Fixed by #1142
Assignees
Labels
bug Something isn't working

Comments

@victorlin
Copy link
Member

victorlin commented Jan 25, 2023

Current Behavior

The help text of --year-bounds states:

  --year-bounds YEAR_BOUNDS [YEAR_BOUNDS ...]
                        specify min or max & min prediction bounds for samples with XX in year (default: None)

However, ambiguous dates with XX in the year are unbounded by the value passed to this option.

The value gets passed through many functions in dates.py, down to DateDisambiguator where it is left unused.

I believe this is because some old logic got lost in a refactor: 1cf41fb

Expected behavior

This option should limit the range of date inference when XX is in the year.

How to reproduce

  1. Setup: Create a copy of tests/functional/refine/metadata.tsv, adding partial ambiguity on the century-level (20XX) for the first strain PAN/CDC_259359_V1_V3/2015.

    cat >metadata-custom-date.tsv << ~~
    strain	date
    PAN/CDC_259359_V1_V3/2015	20XX-XX-XX
    COL/FLR_00024/2015	2015-12-XX
    PRVABC59	2015-12-XX
    COL/FLR_00008/2015	2015-12-XX
    Colombia/2016/ZC204Se	2016-01-06
    ZKC2/2016	2016-02-16
    VEN/UF_1/2016	2016-03-25
    DOM/2016/BB_0059	2016-04-04
    BRA/2016/FC_6706	2016-04-08
    DOM/2016/BB_0183	2016-04-18
    EcEs062_16	2016-04-XX
    HND/2016/HU_ME59	2016-05-13
    ~~
  2. Run without --year-bounds. Note that --timetree must be specified otherwise the --year-bounds value is never used in refine.py. Observe that the inferred date for the strain is 2023-01-26.

    augur refine \
      --tree tests/functional/refine/tree_raw.nwk \
      --alignment tests/functional/refine/aligned.fasta \
      --metadata metadata-custom-date.tsv \
      --timetree \
      --output-tree output_tree.nwk \
      --output-node-data output_branch_lengths.json \
      --coalescent opt \
      --date-confidence \
      --date-inference marginal \
      --clock-filter-iqd 4 \
      --seed 314159 \
      --divergence-units mutations-per-site
    
    jq -r '.nodes."PAN/CDC_259359_V1_V3/2015".date' output_branch_lengths.json
    # 2023-01-26
  3. Run with --year-bounds 1900 2000. That should have limited the inferred date to be between 1900 and 2000, however it is still 2023-01-26.

    augur refine \
      --tree tests/functional/refine/tree_raw.nwk \
      --alignment tests/functional/refine/aligned.fasta \
      --metadata metadata-custom-date.tsv \
      --timetree \
      --year-bounds 1900 2000 \
      --output-tree output_tree_year_bounds.nwk \
      --output-node-data output_branch_lengths_year_bounds.json \
      --coalescent opt \
      --date-confidence \
      --date-inference marginal \
      --clock-filter-iqd 4 \
      --seed 314159 \
      --divergence-units mutations-per-site
    
    jq -r '.nodes."PAN/CDC_259359_V1_V3/2015".date' output_branch_lengths_year_bounds.json
    # 2023-01-26

Existing usage

A simple GitHub search shows these public usages:

There may be non-public usages.

Possible solution

  1. Remove the option. Formally, this would be a breaking change, though it has been broken since version 9.0.0.
  2. Properly re-implement functionality.
@victorlin victorlin added the bug Something isn't working label Jan 25, 2023
@tsibley
Copy link
Member

tsibley commented Jan 26, 2023

Remove the flag. A simple GitHub search shows only one public usage of this, last updated in May 2020. I don't think it's being actively used.

A search of public code on GitHub is not, IMO, sufficient for determining usage of Augur. I expect (justifiably, I think) that a lot of Augur usage is "dark", e.g. not publicly visible.

That's not to say we can't remove --year-bounds, but I don't think we can use public code search to declare it unused.

@tsibley
Copy link
Member

tsibley commented Jan 26, 2023

(I realize in this case it's kinda moot because --year-bounds hasn't done anything for 2 years, but the general point remains.)

@tsibley
Copy link
Member

tsibley commented Jan 26, 2023

But also, maybe it should work, since we ourselves expected it to (and it used to)?

https://github.com/neherlab/2019-krisp-nextstrain-workshop/blob/master/tb_walker/Snakefile#L71-L82

@tsibley
Copy link
Member

tsibley commented Jan 26, 2023

Hmm, I'm also not following why we'd expect an effect in the example you gave. The description of year bounds is:

specify min or max & min prediction bounds for samples with XX in year

but none of the dates in tests/functional/refine/metadata.tsv have a partially-ambiguous year.

@victorlin
Copy link
Member Author

none of the dates in tests/functional/refine/metadata.tsv have a partially-ambiguous year.

Oh, good catch. I'll update with a better example.

@victorlin
Copy link
Member Author

I just realized that the previous example was irrelevant as --timetree wasn't passed, so --year-bounds wasn't even used in refine.py. Updated with a better example that explains expected vs. observed behavior.

@victorlin
Copy link
Member Author

I mentioned this in Slack and the consensus is to properly re-implement functionality.

victorlin added a commit that referenced this issue Feb 27, 2023
Previously, the value specified was unused in the code. This restores
the functionality.

I moved the min_max_year argument from the constructor to range() since
it is only used there.

Fixes #1136.
victorlin added a commit that referenced this issue Feb 27, 2023
Previously, the value specified was unused in the code. This restores
the functionality.

I moved the min_max_year argument from the constructor to range() since
it is only used there.

Fixes #1136.
victorlin added a commit that referenced this issue Feb 27, 2023
Previously, the value specified was unused in the code. This restores
the functionality.

I moved the min_max_year argument from the constructor to range() since
it is only used there.

Fixes #1136.
victorlin added a commit that referenced this issue Mar 8, 2023
Previously, the value specified was unused in the code. This restores
the functionality.

I moved the min_max_year argument from the constructor to range() since
it is only used there.

Fixes #1136.
victorlin added a commit that referenced this issue Mar 16, 2023
Previously, the value specified was unused in the code. This restores
the functionality.

I moved the min_max_year argument from the constructor to range() since
it is only used there.

Fixes #1136.
victorlin added a commit that referenced this issue Mar 29, 2023
Previously, the value specified was unused in the code. This restores
the functionality.

I moved the min_max_year argument from the constructor to range() since
it is only used there.

Fixes #1136.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
2 participants