Skip to content

Commit

Permalink
Cleanup open build config
Browse files Browse the repository at this point in the history
1. Replace "GENBANK" with "GenBank" per NCBI style in data_provenance
2. Drop "GenBank" from title as it will appear immediately below in byline
3. Update description mainly to warn against scooping data generators
  • Loading branch information
trvrb committed Jun 26, 2021
1 parent 1e01f42 commit 6b78079
Show file tree
Hide file tree
Showing 9 changed files with 30 additions and 30 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
],
"data_provenance": [
{
"name": "GENBANK",
"name": "GenBank",
"url": "https://www.ncbi.nlm.nih.gov/genbank/"
}
],
Expand Down Expand Up @@ -101,4 +101,4 @@
"entropy",
"frequencies"
]
}
}
4 changes: 2 additions & 2 deletions nextstrain_profiles/nextstrain-open/asia_auspice_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
],
"data_provenance": [
{
"name": "GENBANK",
"name": "GenBank",
"url": "https://www.ncbi.nlm.nih.gov/genbank/"
}
],
Expand Down Expand Up @@ -101,4 +101,4 @@
"entropy",
"frequencies"
]
}
}
14 changes: 7 additions & 7 deletions nextstrain_profiles/nextstrain-open/builds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,37 +34,37 @@ builds:
global:
subsampling_scheme: nextstrain_region_global
auspice_config: nextstrain_profiles/nextstrain-open/global_auspice_config.json
title: Genomic epidemiology of GenBank SARS-CoV-2 genomes, global subsampling
title: Genomic epidemiology of SARS-CoV-2 with global subsampling
africa:
subsampling_scheme: nextstrain_region
region: Africa
auspice_config: nextstrain_profiles/nextstrain-open/africa_auspice_config.json
title: Genomic epidemiology of GenBank SARS-CoV-2 genomes focused on Africa
title: Genomic epidemiology of SARS-CoV-2 with Africa-focused subsampling
asia:
subsampling_scheme: nextstrain_region
region: Asia
auspice_config: nextstrain_profiles/nextstrain-open/asia_auspice_config.json
title: Genomic epidemiology of GenBank SARS-CoV-2 genomes focused on Asia
title: Genomic epidemiology of SARS-CoV-2 with Asia-focused subsampling
europe:
subsampling_scheme: nextstrain_region
region: Europe
auspice_config: nextstrain_profiles/nextstrain-open/europe_auspice_config.json
title: Genomic epidemiology of GenBank SARS-CoV-2 genomes focused on Europe
title: Genomic epidemiology of SARS-CoV-2 with Europe-focused subsampling
north-america:
subsampling_scheme: nextstrain_region
region: North America
auspice_config: nextstrain_profiles/nextstrain-open/north-america_auspice_config.json
title: Genomic epidemiology of GenBank SARS-CoV-2 genomes focused on North America
title: Genomic epidemiology of SARS-CoV-2 with North America-focused subsampling
oceania:
subsampling_scheme: nextstrain_region
region: Oceania
auspice_config: nextstrain_profiles/nextstrain-open/oceania_auspice_config.json
title: Genomic epidemiology of GenBank SARS-CoV-2 genomes focused on Oceania
title: Genomic epidemiology of SARS-CoV-2 with Oceania-focused subsampling
south-america:
subsampling_scheme: nextstrain_region
region: South America
auspice_config: nextstrain_profiles/nextstrain-open/south-america_auspice_config.json
title: Genomic epidemiology of GenBank SARS-CoV-2 genomes focused on South America
title: Genomic epidemiology of SARS-CoV-2 with South America-focused subsampling


# remove S dropout sequences and sequences without division label in US
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
],
"data_provenance": [
{
"name": "GENBANK",
"name": "GenBank",
"url": "https://www.ncbi.nlm.nih.gov/genbank/"
}
],
Expand Down Expand Up @@ -101,4 +101,4 @@
"entropy",
"frequencies"
]
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
],
"data_provenance": [
{
"name": "GENBANK",
"name": "GenBank",
"url": "https://www.ncbi.nlm.nih.gov/genbank/"
}
],
Expand Down Expand Up @@ -101,4 +101,4 @@
"entropy",
"frequencies"
]
}
}
18 changes: 9 additions & 9 deletions nextstrain_profiles/nextstrain-open/nextstrain_description.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,34 +6,34 @@ There are hundreds of thousands of complete SARS-CoV-2 genomes available on open

Site numbering and genome structure uses [Wuhan-Hu-1/2019](https://www.ncbi.nlm.nih.gov/nuccore/MN908947) as reference. The phylogeny is rooted relative to early samples from Wuhan. Temporal resolution assumes a nucleotide substitution rate of 8 × 10^-4 subs per site per year. Full details on bioinformatic processing can be found [here](https://github.com/nextstrain/ncov).

The analysis on this page uses data from completely open sources, such that we can make input data and intermediate files available for further analysis (see below).
But be aware that not all regions are well represented in open data bases and some of the above trees might lack recent data from particular geographic regions.
We gratefully acknowledge the authors, originating and submitting laboratories of the genetic sequence and metadata for sharing their work in open databases. A full listing of all originating and submitting laboratories is available below. An attribution table is available by clicking on "Download Data" at the bottom of the page and then clicking on "Strain Metadata" in the resulting dialog box.
The analysis on this page uses data from NCBI GenBank as a source following Open Data principles, such that we can make input data and intermediate files available for further analysis (see below). Open Data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike. But be aware that not all regions are well represented in open databases and some of the above trees might lack recent data from particular geographic regions.

We gratefully acknowledge the authors, originating and submitting laboratories of the genetic sequences and metadata for sharing their work in open databases. Please note that although data generators have generously shared data in an open fashion, that does not mean there should be free license to publish on this data. Data generators should be cited where possible and collaborations should be sought in some circumstances. Please try to avoid scooping someone else's work. Reach out if uncertain. An attribution table is available by clicking on "Download Data" at the bottom of the page and then clicking on "Strain Metadata" in the resulting dialog box.

To maximize the utility and visibility of these generously shared data, we provide preprocessed files that can serve as a starting point for additional analyses.

### All sequences and metadata

#### Ingested and parsed data

* [sequences.fasta.gz](https://data.nextstrain.org/files/ncov/open/sequences.fasta.gz)
* [metadata.tsv.gz](https://data.nextstrain.org/files/ncov/open/metadata.tsv.gz)


### Pre-processed file
#### Pre-processed files

* [aligned.fasta.xz](https://data.nextstrain.org/files/ncov/open/aligned.fasta.xz)
* [filtered.fasta.xz](https://data.nextstrain.org/files/ncov/open/filtered.fasta.xz)
* [masked.fasta.xz](https://data.nextstrain.org/files/ncov/open/masked.fasta.xz)
* [mutation-summary.tsv.xz](https://data.nextstrain.org/files/ncov/open/mutation-summary.tsv.xz)

### Subsampled sequences and intermediate files
### Subsampled sequences and intermediate files

The files below exist for the `global` and the regional builds (`asia`, `africa`, `europe`, `north-america`, `oceania`, `south-america`).
The files below exist for the `global` and the regional builds (`africa`, `asia`, `europe`, `north-america`, `oceania` and `south-america`).
The links below refer to the `global` build, substitute `global` with the desired region in the links if necessary

* [aligned.fasta.xz](https://data.nextstrain.org/files/ncov/open/global/aligned.fasta.xz)
* [sequences.fasta.xz](https://data.nextstrain.org/files/ncov/open/global/sequences.fasta.xz)
* [metadata.tsv.xz](https://data.nextstrain.org/files/ncov/open/global/metadata.tsv.xz)
* [aligned.fasta.xz](https://data.nextstrain.org/files/ncov/open/global/aligned.fasta.xz)
* [auspice tree](https://data.nextstrain.org/files/ncov/open/global/global.json)
* [auspice root sequence](https://data.nextstrain.org/files/ncov/open/global/global_root-sequence.json)
* [auspice tip frequencies](https://data.nextstrain.org/files/ncov/open/global/global_tip-frequencies.json)

Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
],
"data_provenance": [
{
"name": "GENBANK",
"name": "GenBank",
"url": "https://www.ncbi.nlm.nih.gov/genbank/"
}
],
Expand Down Expand Up @@ -101,4 +101,4 @@
"entropy",
"frequencies"
]
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
],
"data_provenance": [
{
"name": "GENBANK",
"name": "GenBank",
"url": "https://www.ncbi.nlm.nih.gov/genbank/"
}
],
Expand Down Expand Up @@ -101,4 +101,4 @@
"entropy",
"frequencies"
]
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
],
"data_provenance": [
{
"name": "GENBANK",
"name": "GenBank",
"url": "https://www.ncbi.nlm.nih.gov/genbank/"
}
],
Expand Down Expand Up @@ -101,4 +101,4 @@
"entropy",
"frequencies"
]
}
}

0 comments on commit 6b78079

Please sign in to comment.