diff --git a/docs/src/reference/change_log.md b/docs/src/reference/change_log.md index 2b37ef963..cd38e306b 100644 --- a/docs/src/reference/change_log.md +++ b/docs/src/reference/change_log.md @@ -5,6 +5,8 @@ We also use this change log to document new features that maintain backward comp ## New features since last version update +- 1 June 2022: Add "2m" timespan in Nextstrain profile builds. [PR 957](https://github.com/nextstrain/ncov/pull/957) + - 29 April 2022: Include multiple timespans in Nextstrain profile builds. [PR 910](https://github.com/nextstrain/ncov/pull/910) - 29 April 2022: Update default mask parameters to mask 200 bases from the end of the genome rather than the existing 50. This was necessary because there is a large deletion in this region in circulating 21L viruses. This deletion is causing problems with alignment and the resulting mis-alignment appears as excess mutations in the tree. [PR 939](https://github.com/nextstrain/ncov/pull/939). diff --git a/nextstrain_profiles/nextstrain-gisaid/builds.yaml b/nextstrain_profiles/nextstrain-gisaid/builds.yaml index cc82bef60..3f7513de0 100644 --- a/nextstrain_profiles/nextstrain-gisaid/builds.yaml +++ b/nextstrain_profiles/nextstrain-gisaid/builds.yaml @@ -39,16 +39,25 @@ inputs: # (They override the defaults) # North America and Oceania are subsampled at the "division" level # Africa, Asia, Europe and South America are subsampled at the "country" level +# +# Auspice config is specified in rule auspice_config in export_for_nextstrain.smk builds: reference: subsampling_scheme: nextstrain_reference title: Genomic epidemiology of SARS-CoV-2 with clade-focused subsampling + global_2m: + subsampling_scheme: nextstrain_global_2m + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused globally over the past 2 months global_6m: subsampling_scheme: nextstrain_global_6m title: Genomic epidemiology of SARS-CoV-2 with subsampling focused globally over the past 6 months global_all-time: subsampling_scheme: nextstrain_global_all_time title: Genomic epidemiology of SARS-CoV-2 with subsampling focused globally since pandemic start + africa_2m: + subsampling_scheme: nextstrain_region_grouped_by_country_2m + region: Africa + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Africa over the past 2 months africa_6m: subsampling_scheme: nextstrain_region_grouped_by_country_6m region: Africa @@ -57,6 +66,10 @@ builds: subsampling_scheme: nextstrain_region_grouped_by_country_all_time region: Africa title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Africa since pandemic start + asia_2m: + subsampling_scheme: nextstrain_region_grouped_by_country_2m + region: Asia + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Asia over the past 2 months asia_6m: subsampling_scheme: nextstrain_region_grouped_by_country_6m region: Asia @@ -65,6 +78,10 @@ builds: subsampling_scheme: nextstrain_region_grouped_by_country_all_time region: Asia title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Asia since pandemic start + europe_2m: + subsampling_scheme: nextstrain_region_grouped_by_country_2m + region: Europe + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Europe over the past 2 months europe_6m: subsampling_scheme: nextstrain_region_grouped_by_country_6m region: Europe @@ -73,6 +90,10 @@ builds: subsampling_scheme: nextstrain_region_grouped_by_country_all_time region: Europe title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Europe since pandemic start + north-america_2m: + subsampling_scheme: nextstrain_region_grouped_by_division_2m + region: North America + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on North America over the past 2 months north-america_6m: subsampling_scheme: nextstrain_region_grouped_by_division_6m region: North America @@ -81,6 +102,10 @@ builds: subsampling_scheme: nextstrain_region_grouped_by_division_all_time region: North America title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on North America since pandemic start + oceania_2m: + subsampling_scheme: nextstrain_region_grouped_by_division_2m + region: Oceania + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Oceania over the past 2 months oceania_6m: subsampling_scheme: nextstrain_region_grouped_by_division_6m region: Oceania @@ -89,6 +114,10 @@ builds: subsampling_scheme: nextstrain_region_grouped_by_division_all_time region: Oceania title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Oceania since pandemic start + south-america_2m: + subsampling_scheme: nextstrain_region_grouped_by_country_2m + region: South America + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on South America over the past 2 months south-america_6m: subsampling_scheme: nextstrain_region_grouped_by_country_6m region: South America @@ -110,6 +139,37 @@ subsampling: group_by: "Nextstrain_clade" max_sequences: 300 + # Custom subsampling logic for regions over 2m + # Grouping by division for North America and Oceania + # 4000 total + # 4:1 ratio of recent to early + # 4:1 ratio of focal to context + nextstrain_region_grouped_by_division_2m: + # Early focal samples for region + focal_early: + group_by: "division year month" + max_sequences: 640 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!={region}'" + # Early contextual samples from the rest of the world + context_early: + group_by: "country year month" + max_sequences: 160 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region={region}'" + # Recent focal samples for region + focal_recent: + group_by: "division year month" + max_sequences: 2560 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!={region}'" + # Early contextual samples from the rest of the world + context_recent: + group_by: "country year month" + max_sequences: 640 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region={region}'" + # Custom subsampling logic for regions over 6m # Grouping by division for North America and Oceania # 4000 total @@ -157,6 +217,37 @@ subsampling: max_sequences: 800 exclude: "--exclude-where 'region={region}'" + # Custom subsampling logic for regions over 2m + # Grouping by country for Africa, Asia, Europe and South America + # 4000 total + # 4:1 ratio of recent to early + # 4:1 ratio of focal to context + nextstrain_region_grouped_by_country_2m: + # Early focal samples for region + focal_early: + group_by: "country year month" + max_sequences: 640 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!={region}'" + # Early contextual samples from the rest of the world + context_early: + group_by: "country year month" + max_sequences: 160 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region={region}'" + # Recent focal samples for region + focal_recent: + group_by: "country year month" + max_sequences: 2560 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!={region}'" + # Early contextual samples from the rest of the world + context_recent: + group_by: "country year month" + max_sequences: 640 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region={region}'" + # Custom subsampling logic for regions over 6m # Grouping by country for Africa, Asia, Europe and South America # 4000 total @@ -204,6 +295,72 @@ subsampling: max_sequences: 800 exclude: "--exclude-where 'region={region}'" + # Custom subsampling logic for global region over 2m + # 4000 total + # 4:1 ratio of focal to context + # all regions equal except Oceania at 33% + nextstrain_global_2m: + africa_early: + group_by: "country year month" + max_sequences: 150 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!=Africa'" + asia_early: + group_by: "country year month" + max_sequences: 150 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!=Asia'" + europe_early: + group_by: "country year month" + max_sequences: 150 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!=Europe'" + north_america_early: + group_by: "division year month" + max_sequences: 150 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!=North America'" + south_america_early: + group_by: "country year month" + max_sequences: 150 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!=South America'" + oceania_early: + group_by: "division year month" + max_sequences: 50 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!=Oceania'" + africa_recent: + group_by: "country year month" + max_sequences: 600 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!=Africa'" + asia_recent: + group_by: "country year month" + max_sequences: 600 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!=Asia'" + europe_recent: + group_by: "country year month" + max_sequences: 600 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!=Europe'" + north_america_recent: + group_by: "division year month" + max_sequences: 600 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!=North America'" + south_america_recent: + group_by: "country year month" + max_sequences: 600 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!=South America'" + oceania_recent: + group_by: "division year month" + max_sequences: 200 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!=Oceania'" + # Custom subsampling logic for global region over 6m # 4000 total # 4:1 ratio of focal to context @@ -302,42 +459,63 @@ subsampling: # if different traits should be reconstructed for some builds, specify here # otherwise the default trait config in defaults/parameters.yaml will used traits: + global_2m: + sampling_bias_correction: 2.5 + columns: ["region"] global_6m: sampling_bias_correction: 2.5 columns: ["region"] global_all-time: sampling_bias_correction: 2.5 columns: ["region"] + africa_2m: + sampling_bias_correction: 2.5 + columns: ["country"] africa_6m: sampling_bias_correction: 2.5 columns: ["country"] africa_all-time: sampling_bias_correction: 2.5 columns: ["country"] + asia_2m: + sampling_bias_correction: 2.5 + columns: ["country"] asia_6m: sampling_bias_correction: 2.5 columns: ["country"] asia_all-time: sampling_bias_correction: 2.5 columns: ["country"] + europe_2m: + sampling_bias_correction: 2.5 + columns: ["country"] europe_6m: sampling_bias_correction: 2.5 columns: ["country"] europe_all-time: sampling_bias_correction: 2.5 columns: ["country"] + north-america_2m: + sampling_bias_correction: 2.5 + columns: ["division"] north-america_6m: sampling_bias_correction: 2.5 columns: ["division"] north-america_all-time: sampling_bias_correction: 2.5 columns: ["division"] + oceania_2m: + sampling_bias_correction: 2.5 + columns: ["division"] oceania_6m: sampling_bias_correction: 2.5 columns: ["division"] oceania_all-time: sampling_bias_correction: 2.5 columns: ["division"] + south-america_2m: + sampling_bias_correction: 2.5 + columns: ["country"] south-america_6m: sampling_bias_correction: 2.5 columns: ["country"] @@ -346,47 +524,91 @@ traits: columns: ["country"] # Define frequencies parameters -# Target frequencies to "6m" vs "all-time" builds +# Target frequencies to "2m", "6m" and "all-time" builds +# narrow_bandwidth = 0.019 or 7 days for "2m" +# narrow_bandwidth = 0.038 or 14 days for "6m" and "all-time" frequencies: - global_6m: + global_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + global_6m: min_date: "6M" - global_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + global_all-time: min_date: "2020-01-01" - africa_6m: + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 + africa_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + africa_6m: min_date: "6M" - africa_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + africa_all-time: min_date: "2020-01-01" - asia_6m: + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 + asia_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + asia_6m: min_date: "6M" - asia_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + asia_all-time: min_date: "2020-01-01" - europe_6m: + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 + europe_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + europe_6m: min_date: "6M" - europe_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + europe_all-time: min_date: "2020-01-01" - north-america_6m: + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 + north-america_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + north-america_6m: min_date: "6M" - north-america_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + north-america_all-time: min_date: "2020-01-01" - oceania_6m: + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 + oceania_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + oceania_6m: min_date: "6M" - oceania_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + oceania_all-time: min_date: "2020-01-01" - south-america_6m: + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 + south-america_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + south-america_6m: min_date: "6M" - south-america_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + south-america_all-time: min_date: "2020-01-01" + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 diff --git a/nextstrain_profiles/nextstrain-gisaid/nextstrain_description.md b/nextstrain_profiles/nextstrain-gisaid/nextstrain_description.md index c4247ada9..6a9b30053 100644 --- a/nextstrain_profiles/nextstrain-gisaid/nextstrain_description.md +++ b/nextstrain_profiles/nextstrain-gisaid/nextstrain_description.md @@ -4,22 +4,15 @@ This phylogeny shows evolutionary relationships of SARS-CoV-2 viruses from the o There are millions of complete SARS-CoV-2 genomes available and this number increases every day. This visualization can only handle ~4000 genomes in a single view for performance and legibility reasons. Because of this we subsample available genome data for our analysis views. We provision multiple views to focus subsampling on different geographic regions and different time periods. These views are available through the "Dataset" dropdown on the left or by clicking on the following links: -region | time period | URL -------------- | ------------- | --- -global | past 6 months | [/ncov/gisaid/global/6m](/ncov/gisaid/global/6m) -Africa | past 6 months | [/ncov/gisaid/africa/6m](/ncov/gisaid/africa/6m?f_region=Africa) -Asia | past 6 months | [/ncov/gisaid/asia/6m](/ncov/gisaid/asia/6m?f_region=Asia) -Europe | past 6 months | [/ncov/gisaid/europe/6m](/ncov/gisaid/europe/6m?f_region=Europe) -North America | past 6 months | [/ncov/gisaid/north-america/6m](/ncov/gisaid/north-america/6m?f_region=North%20America) -Oceania | past 6 months | [/ncov/gisaid/oceania/6m](/ncov/gisaid/oceania/6m?f_region=Oceania) -South America | past 6 months | [/ncov/gisaid/south-america/6m](/ncov/gisaid/south-america/6m?f_region=South%20America) -global | all time | [/ncov/gisaid/global/all-time](/ncov/gisaid/global/all-time) -Africa | all time | [/ncov/gisaid/africa/all-time](/ncov/gisaid/africa/all-time?f_region=Africa) -Asia | all time | [/ncov/gisaid/asia/all-time](/ncov/gisaid/asia/all-time?f_region=Asia) -Europe | all time | [/ncov/gisaid/europe/all-time](/ncov/gisaid/europe/all-time?f_region=Europe) -North America | all time | [/ncov/gisaid/north-america/all-time](/ncov/gisaid/north-america/all-time?f_region=North%20America) -Oceania | all time | [/ncov/gisaid/oceania/all-time](/ncov/gisaid/oceania/all-time?f_region=Oceania) -South America | all time | [/ncov/gisaid/south-america/all-time](/ncov/gisaid/south-america/all-time?f_region=South%20America) +  | past 2 months | past 6 months | all time +----------------- | -------------------------------------------------------------------------- | -------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- +**global** | [global/2m](/ncov/gisaid/global/2m) | [global/6m](/ncov/gisaid/global/6m) | [global/all-time](/ncov/gisaid/global/all-time) +**Africa** | [africa/2m](/ncov/gisaid/africa/2m?f_region=Africa) | [africa/6m](/ncov/gisaid/africa/6m?f_region=Africa) | [africa/all-time](/ncov/gisaid/africa/all-time?f_region=Africa) +**Asia** | [asia/2m](/ncov/gisaid/asia/2m?f_region=Asia) | [asia/6m](/ncov/gisaid/asia/6m?f_region=Asia) | [asia/all-time](/ncov/gisaid/asia/all-time?f_region=Asia) +**Europe** | [europe/2m](/ncov/gisaid/europe/2m?f_region=Europe) | [europe/6m](/ncov/gisaid/europe/6m?f_region=Europe) | [europe/all-time](/ncov/gisaid/europe/all-time?f_region=Europe) +**North America** | [north-america/2m](/ncov/gisaid/north-america/2m?f_region=North%20America) | [north-america/6m](/ncov/gisaid/north-america/6m?f_region=North%20America) | [north-america/all-time](/ncov/gisaid/north-america/all-time?f_region=North%20America) +**Oceania** | [oceania/2m](/ncov/gisaid/oceania/2m?f_region=Oceania) | [oceania/6m](/ncov/gisaid/oceania/6m?f_region=Oceania) | [oceania/all-time](/ncov/gisaid/oceania/all-time?f_region=Oceania) +**South America** | [south-america/2m](/ncov/gisaid/south-america/2m?f_region=South%20America) | [south-america/6m](/ncov/gisaid/south-america/6m?f_region=South%20America) | [south-america/all-time](/ncov/gisaid/south-america/all-time?f_region=South%20America) Site numbering and genome structure uses [Wuhan-Hu-1/2019](https://www.ncbi.nlm.nih.gov/nuccore/MN908947) as reference. The phylogeny is rooted relative to early samples from Wuhan. Temporal resolution assumes a nucleotide substitution rate of 8 × 10^-4 subs per site per year. Mutational fitness is calculated using results from [Obermeyer et al (under review)](https://www.medrxiv.org/content/10.1101/2021.09.07.21263228v1). Full details on bioinformatic processing can be found [here](https://github.com/nextstrain/ncov). diff --git a/nextstrain_profiles/nextstrain-open/builds.yaml b/nextstrain_profiles/nextstrain-open/builds.yaml index b6457c094..1129942bb 100644 --- a/nextstrain_profiles/nextstrain-open/builds.yaml +++ b/nextstrain_profiles/nextstrain-open/builds.yaml @@ -39,16 +39,25 @@ inputs: # (They override the defaults) # North America and Oceania are subsampled at the "division" level # Africa, Asia, Europe and South America are subsampled at the "country" level +# +# Auspice config is specified in rule auspice_config in export_for_nextstrain.smk builds: reference: subsampling_scheme: nextstrain_reference title: Genomic epidemiology of SARS-CoV-2 with clade-focused subsampling + global_2m: + subsampling_scheme: nextstrain_global_2m + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused globally over the past 2 months global_6m: subsampling_scheme: nextstrain_global_6m title: Genomic epidemiology of SARS-CoV-2 with subsampling focused globally over the past 6 months global_all-time: subsampling_scheme: nextstrain_global_all_time title: Genomic epidemiology of SARS-CoV-2 with subsampling focused globally since pandemic start + africa_2m: + subsampling_scheme: nextstrain_region_grouped_by_country_2m + region: Africa + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Africa over the past 2 months africa_6m: subsampling_scheme: nextstrain_region_grouped_by_country_6m region: Africa @@ -57,6 +66,10 @@ builds: subsampling_scheme: nextstrain_region_grouped_by_country_all_time region: Africa title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Africa since pandemic start + asia_2m: + subsampling_scheme: nextstrain_region_grouped_by_country_2m + region: Asia + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Asia over the past 2 months asia_6m: subsampling_scheme: nextstrain_region_grouped_by_country_6m region: Asia @@ -65,6 +78,10 @@ builds: subsampling_scheme: nextstrain_region_grouped_by_country_all_time region: Asia title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Asia since pandemic start + europe_2m: + subsampling_scheme: nextstrain_region_grouped_by_country_2m + region: Europe + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Europe over the past 2 months europe_6m: subsampling_scheme: nextstrain_region_grouped_by_country_6m region: Europe @@ -73,6 +90,10 @@ builds: subsampling_scheme: nextstrain_region_grouped_by_country_all_time region: Europe title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Europe since pandemic start + north-america_2m: + subsampling_scheme: nextstrain_region_grouped_by_division_2m + region: North America + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on North America over the past 2 months north-america_6m: subsampling_scheme: nextstrain_region_grouped_by_division_6m region: North America @@ -81,6 +102,10 @@ builds: subsampling_scheme: nextstrain_region_grouped_by_division_all_time region: North America title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on North America since pandemic start + oceania_2m: + subsampling_scheme: nextstrain_region_grouped_by_division_2m + region: Oceania + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Oceania over the past 2 months oceania_6m: subsampling_scheme: nextstrain_region_grouped_by_division_6m region: Oceania @@ -89,6 +114,10 @@ builds: subsampling_scheme: nextstrain_region_grouped_by_division_all_time region: Oceania title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on Oceania since pandemic start + south-america_2m: + subsampling_scheme: nextstrain_region_grouped_by_country_2m + region: South America + title: Genomic epidemiology of SARS-CoV-2 with subsampling focused on South America over the past 2 months south-america_6m: subsampling_scheme: nextstrain_region_grouped_by_country_6m region: South America @@ -110,6 +139,37 @@ subsampling: group_by: "Nextstrain_clade" max_sequences: 300 + # Custom subsampling logic for regions over 2m + # Grouping by division for North America and Oceania + # 4000 total + # 4:1 ratio of recent to early + # 4:1 ratio of focal to context + nextstrain_region_grouped_by_division_2m: + # Early focal samples for region + focal_early: + group_by: "division year month" + max_sequences: 640 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!={region}'" + # Early contextual samples from the rest of the world + context_early: + group_by: "country year month" + max_sequences: 160 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region={region}'" + # Recent focal samples for region + focal_recent: + group_by: "division year month" + max_sequences: 2560 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!={region}'" + # Early contextual samples from the rest of the world + context_recent: + group_by: "country year month" + max_sequences: 640 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region={region}'" + # Custom subsampling logic for regions over 6m # Grouping by division for North America and Oceania # 4000 total @@ -157,6 +217,37 @@ subsampling: max_sequences: 800 exclude: "--exclude-where 'region={region}'" + # Custom subsampling logic for regions over 2m + # Grouping by country for Africa, Asia, Europe and South America + # 4000 total + # 4:1 ratio of recent to early + # 4:1 ratio of focal to context + nextstrain_region_grouped_by_country_2m: + # Early focal samples for region + focal_early: + group_by: "country year month" + max_sequences: 640 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!={region}'" + # Early contextual samples from the rest of the world + context_early: + group_by: "country year month" + max_sequences: 160 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region={region}'" + # Recent focal samples for region + focal_recent: + group_by: "country year month" + max_sequences: 2560 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!={region}'" + # Early contextual samples from the rest of the world + context_recent: + group_by: "country year month" + max_sequences: 640 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region={region}'" + # Custom subsampling logic for regions over 6m # Grouping by country for Africa, Asia, Europe and South America # 4000 total @@ -204,6 +295,72 @@ subsampling: max_sequences: 800 exclude: "--exclude-where 'region={region}'" + # Custom subsampling logic for global region over 2m + # 4000 total + # 4:1 ratio of focal to context + # all regions equal except Oceania at 33% + nextstrain_global_2m: + africa_early: + group_by: "country year month" + max_sequences: 150 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!=Africa'" + asia_early: + group_by: "country year month" + max_sequences: 150 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!=Asia'" + europe_early: + group_by: "country year month" + max_sequences: 150 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!=Europe'" + north_america_early: + group_by: "division year month" + max_sequences: 150 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!=North America'" + south_america_early: + group_by: "country year month" + max_sequences: 150 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!=South America'" + oceania_early: + group_by: "division year month" + max_sequences: 50 + max_date: "--max-date 2M" + exclude: "--exclude-where 'region!=Oceania'" + africa_recent: + group_by: "country year month" + max_sequences: 600 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!=Africa'" + asia_recent: + group_by: "country year month" + max_sequences: 600 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!=Asia'" + europe_recent: + group_by: "country year month" + max_sequences: 600 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!=Europe'" + north_america_recent: + group_by: "division year month" + max_sequences: 600 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!=North America'" + south_america_recent: + group_by: "country year month" + max_sequences: 600 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!=South America'" + oceania_recent: + group_by: "division year month" + max_sequences: 200 + min_date: "--min-date 2M" + exclude: "--exclude-where 'region!=Oceania'" + # Custom subsampling logic for global region over 6m # 4000 total # 4:1 ratio of focal to context @@ -307,42 +464,63 @@ refine: # if different traits should be reconstructed for some builds, specify here # otherwise the default trait config in defaults/parameters.yaml will used traits: + global_2m: + sampling_bias_correction: 2.5 + columns: ["region"] global_6m: sampling_bias_correction: 2.5 columns: ["region"] global_all-time: sampling_bias_correction: 2.5 columns: ["region"] + africa_2m: + sampling_bias_correction: 2.5 + columns: ["country"] africa_6m: sampling_bias_correction: 2.5 columns: ["country"] africa_all-time: sampling_bias_correction: 2.5 columns: ["country"] + asia_2m: + sampling_bias_correction: 2.5 + columns: ["country"] asia_6m: sampling_bias_correction: 2.5 columns: ["country"] asia_all-time: sampling_bias_correction: 2.5 columns: ["country"] + europe_2m: + sampling_bias_correction: 2.5 + columns: ["country"] europe_6m: sampling_bias_correction: 2.5 columns: ["country"] europe_all-time: sampling_bias_correction: 2.5 columns: ["country"] + north-america_2m: + sampling_bias_correction: 2.5 + columns: ["division"] north-america_6m: sampling_bias_correction: 2.5 columns: ["division"] north-america_all-time: sampling_bias_correction: 2.5 columns: ["division"] + oceania_2m: + sampling_bias_correction: 2.5 + columns: ["division"] oceania_6m: sampling_bias_correction: 2.5 columns: ["division"] oceania_all-time: sampling_bias_correction: 2.5 columns: ["division"] + south-america_2m: + sampling_bias_correction: 2.5 + columns: ["country"] south-america_6m: sampling_bias_correction: 2.5 columns: ["country"] @@ -351,47 +529,91 @@ traits: columns: ["country"] # Define frequencies parameters -# Target frequencies to "6m" vs "all-time" builds +# Target frequencies to "2m", "6m" and "all-time" builds +# narrow_bandwidth = 0.019 or 7 days for "2m" +# narrow_bandwidth = 0.038 or 14 days for "6m" and "all-time" frequencies: - global_6m: + global_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + global_6m: min_date: "6M" - global_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + global_all-time: min_date: "2020-01-01" - africa_6m: + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 + africa_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + africa_6m: min_date: "6M" - africa_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + africa_all-time: min_date: "2020-01-01" - asia_6m: + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 + asia_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + asia_6m: min_date: "6M" - asia_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + asia_all-time: min_date: "2020-01-01" - europe_6m: + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 + europe_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + europe_6m: min_date: "6M" - europe_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + europe_all-time: min_date: "2020-01-01" - north-america_6m: + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 + north-america_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + north-america_6m: min_date: "6M" - north-america_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + north-america_all-time: min_date: "2020-01-01" - oceania_6m: + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 + oceania_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + oceania_6m: min_date: "6M" - oceania_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + oceania_all-time: min_date: "2020-01-01" - south-america_6m: + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 + south-america_2m: + min_date: "2M" + narrow_bandwidth: 0.019 recent_days_to_censor: 7 + south-america_6m: min_date: "6M" - south-america_all-time: + narrow_bandwidth: 0.038 recent_days_to_censor: 7 + south-america_all-time: min_date: "2020-01-01" + narrow_bandwidth: 0.038 + recent_days_to_censor: 7 diff --git a/nextstrain_profiles/nextstrain-open/nextstrain_description.md b/nextstrain_profiles/nextstrain-open/nextstrain_description.md index 22c2143e5..27753a492 100644 --- a/nextstrain_profiles/nextstrain-open/nextstrain_description.md +++ b/nextstrain_profiles/nextstrain-open/nextstrain_description.md @@ -4,22 +4,15 @@ This phylogeny shows evolutionary relationships of SARS-CoV-2 viruses from the o There are millions of complete SARS-CoV-2 genomes available on open databases and this number increases every day. This visualization can only handle ~4000 genomes in a single view for performance and legibility reasons. Because of this we subsample available genome data for our analysis views. We provision multiple views to focus subsampling on different geographic regions and different time periods. These views are available through the "Dataset" dropdown on the left or by clicking on the following links: -region | time period | URL -------------- | ------------- | --- -global | past 6 months | [/ncov/open/global/6m](/ncov/open/global/6m) -Africa | past 6 months | [/ncov/open/africa/6m](/ncov/open/africa/6m?f_region=Africa) -Asia | past 6 months | [/ncov/open/asia/6m](/ncov/open/asia/6m?f_region=Asia) -Europe | past 6 months | [/ncov/open/europe/6m](/ncov/open/europe/6m?f_region=Europe) -North America | past 6 months | [/ncov/open/north-america/6m](/ncov/open/north-america/6m?f_region=North%20America) -Oceania | past 6 months | [/ncov/open/oceania/6m](/ncov/open/oceania/6m?f_region=Oceania) -South America | past 6 months | [/ncov/open/south-america/6m](/ncov/open/south-america/6m?f_region=South%20America) -global | all time | [/ncov/open/global/all-time](/ncov/open/global/all-time) -Africa | all time | [/ncov/open/africa/all-time](/ncov/open/africa/all-time?f_region=Africa) -Asia | all time | [/ncov/open/asia/all-time](/ncov/open/asia/all-time?f_region=Asia) -Europe | all time | [/ncov/open/europe/all-time](/ncov/open/europe/all-time?f_region=Europe) -North America | all time | [/ncov/open/north-america/all-time](/ncov/open/north-america/all-time?f_region=North%20America) -Oceania | all time | [/ncov/open/oceania/all-time](/ncov/open/oceania/all-time?f_region=Oceania) -South America | all time | [/ncov/open/south-america/all-time](/ncov/open/south-america/all-time?f_region=South%20America) +  | past 2 months | past 6 months | all time +----------------- | ------------------------------------------------------------------------ | ------------------------------------------------------------------------ | ------------------------------------------------------------------------------------ +**global** | [global/2m](/ncov/open/global/2m) | [global/6m](/ncov/open/global/6m) | [global/all-time](/ncov/open/global/all-time) +**Africa** | [africa/2m](/ncov/open/africa/2m?f_region=Africa) | [africa/6m](/ncov/open/africa/6m?f_region=Africa) | [africa/all-time](/ncov/open/africa/all-time?f_region=Africa) +**Asia** | [asia/2m](/ncov/open/asia/2m?f_region=Asia) | [asia/6m](/ncov/open/asia/6m?f_region=Asia) | [asia/all-time](/ncov/open/asia/all-time?f_region=Asia) +**Europe** | [europe/2m](/ncov/open/europe/2m?f_region=Europe) | [europe/6m](/ncov/open/europe/6m?f_region=Europe) | [europe/all-time](/ncov/open/europe/all-time?f_region=Europe) +**North America** | [north-america/2m](/ncov/open/north-america/2m?f_region=North%20America) | [north-america/6m](/ncov/open/north-america/6m?f_region=North%20America) | [north-america/all-time](/ncov/open/north-america/all-time?f_region=North%20America) +**Oceania** | [oceania/2m](/ncov/open/oceania/2m?f_region=Oceania) | [oceania/6m](/ncov/open/oceania/6m?f_region=Oceania) | [oceania/all-time](/ncov/open/oceania/all-time?f_region=Oceania) +**South America** | [south-america/2m](/ncov/open/south-america/2m?f_region=South%20America) | [south-america/6m](/ncov/open/south-america/6m?f_region=South%20America) | [south-america/all-time](/ncov/open/south-america/all-time?f_region=South%20America) Site numbering and genome structure uses [Wuhan-Hu-1/2019](https://www.ncbi.nlm.nih.gov/nuccore/MN908947) as reference. The phylogeny is rooted relative to early samples from Wuhan. Temporal resolution assumes a nucleotide substitution rate of 8 × 10^-4 subs per site per year. Mutational fitness is calculated using results from [Obermeyer et al (under review)](https://www.medrxiv.org/content/10.1101/2021.09.07.21263228v1). Full details on bioinformatic processing can be found [here](https://github.com/nextstrain/ncov). @@ -38,7 +31,7 @@ To maximize the utility and visibility of these generously shared data, [we prov #### Subsampled sequences and intermediate files The files below exist for every region (`global`, `africa`, `asia`, `europe`, `north-america`, `oceania` and `south-america`) and correspond to each region's 6 month timespan build (e.g. `global/6m`, `africa/6m`, `asia/6m`, etc). -Files for the `all-time` builds (e.g. `global/all-time`, etc.) are not yet available. +Files for the `2m` and `all-time` builds (e.g. `global/2m`, `global/all-time`, etc.) are not yet available. The links below refer to the `${BUILD_PART_0}` region; substitute `${BUILD_PART_0}` with another region name in the links if desired. - [${BUILD_PART_0}/6m metadata.tsv.xz](https://data.nextstrain.org/files/ncov/open/${BUILD_PART_0}/metadata.tsv.xz) diff --git a/workflow/snakemake_rules/common.smk b/workflow/snakemake_rules/common.smk index 603aaa7ef..63125a6d9 100644 --- a/workflow/snakemake_rules/common.smk +++ b/workflow/snakemake_rules/common.smk @@ -227,7 +227,7 @@ def _get_upload_inputs(wildcards): # for the nextstrain.org/ncov/gisaid and …/open builds and then # special-cases them below. regions = {"global", "africa", "asia", "europe", "north-america", "oceania", "south-america"} - timespans = {"6m", "all-time"} + timespans = {"2m", "6m", "all-time"} region_timespan_builds = [f"{region}_{timespan}" for region, timespan in product(regions, timespans)] # mapping of remote → local filenames