You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In reviewing dengue ingest, I realized we don't have any central documentation on the standard Nextstrain metadata fields. There's currently SARS-CoV-2 docs on it's metadata fields, but it has a lot of SC2/GISAID specific language. (As it should, since that's what it was written for!)
Should we add a metadata section to our top level Data formats page that is focused on standard fields that we use from public data/NCBI? I think with standardizing ingest, we should also have a set of standard metadata fields that is expected for all of our public pathogen metadata. Each pathogen will most likely have additional pathogen-specific fields on top of the standard fields and they would be documented within individual pathogen repos.
Next steps
Compare the fields used in our current public metadata TSVs and propose a list of standardized fields.
Define schema for the metadata TSV. @tsibley suggested that this can be a constrained form of JSON Schema or we can look into existing tabular schemas.
As @j23414commented in a separate PR, we've had internal discussions on whether the NCBI accession column should be accession or genbank_accession.
The general consensus in a previous dev chat was that it's better to be more specific and use genbank_accession. However if the data needs to be merged with private data or data from other sources, then there needs to be a general accession column.
Brought up in Nextstrain office hours today to consider standardizing with NCBI standards in mind, specifically with regards to the One Health Enteric Metadata standards.
Originally discussed in Slack
Next steps
The text was updated successfully, but these errors were encountered: