Skip to content

Commit

Permalink
Split GISAID profile to "six-month" and "all-time" builds
Browse files Browse the repository at this point in the history
This commit splits the existing regional builds "global", "africa", etc... in the "nextstrain-gisaid" profile into "six-month" builds that focus subsampling on the previous six months and "all-time" builds that subsample evenly across time. This uses the new relative dates functionality in "augur filter" to make these subsampling strategies easier to implement and more obvious.

Frequencies timespans are set to match subsampling ranges.

The general subsampling logic is cleaned up in a few ways:
1. North America and Oceania are subsampled and traits reconstructed at the "division" level, while Africa, Asia, Europe and South America are subsampled and traits reconstructed at the "country" level. Previously this behavior had been inconsistent between subsampling, traits, etc...
2. For global builds, all regions are now sampled at equal frequency except for Oceania which is 33%. Previous overemphasis on Europe and North America is no longer justified.
3. There is a consistent 4:1 emphasis on recent vs early samples for the "six-month" builds and a consistent 4:1 emphasis on focal vs context for the regional builds.
  • Loading branch information
trvrb committed Apr 15, 2022
1 parent 58ca2bb commit 0bf1383
Show file tree
Hide file tree
Showing 4 changed files with 313 additions and 136 deletions.
8 changes: 4 additions & 4 deletions defaults/parameters.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,8 @@ filter:
exclude_where: "division='USA'"
exclude_ambiguous_dates_by: "any"

# Exclude sequences which are from before late 2019 (likely date mix-ups)
min_date: 2019.74
# Exclude sequences which are from before Dec 2019 (likely date mix-ups)
min_date: "2019-12-01"

# When choosing contextual samples for a focal set, applying crowding penalty
# will help reduce the number of genetically identical strains that get chosen,
Expand Down Expand Up @@ -136,10 +136,10 @@ frequencies:
# min_date is set by default to 1 year before present
# but can be explicitly set if desired

# Number of months between pivots
# Number of weeks between pivots
pivot_interval: 1

# Measure pivots in weeks rather than months
# Measure pivots in weeks
pivot_interval_units: "weeks"

# KDE bandwidths in proportion of a year to use per strain.
Expand Down
Loading

0 comments on commit 0bf1383

Please sign in to comment.