-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingest: Create new accession field without version #40
Conversation
Resolves #39 Create a new accession field without the version number so that annotations do not need to be updated when the version number is updated.
@@ -46,6 +46,7 @@ curate: | |||
# This is the first step in the pipeline, so any references to field names in the configs below should use the new field names | |||
field_map: | |||
accession: accession | |||
accession_version: accession_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose the column name accession_version
to be clearer, but open to changing it to match the existing pattern accession_rev
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accession_version
is what NCBI actually calls it and what will be used by a similar project I'm working on, so I fully support accession_version
- it's the right term!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
ingest/rules/fetch_from_ncbi.smk
Outdated
@@ -97,6 +97,9 @@ rule format_ncbi_dataset_report: | |||
--fields {params.ncbi_datasets_fields:q} \ | |||
--elide-header \ | |||
| csvtk add-header -t -l -n {params.ncbi_datasets_fields:q} \ | |||
| csvtk rename -t -f accession -n accession_version \ | |||
| csvtk -tl mutate -f accession_version -n accession -p "^(.+?)\." \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would refrain from adding more csvtk to any nextstrain code base when used with TSV files, and phase it out
There are rare but disastrous bugs when tsv fields contain single or double quotes: shenwei356/csvtk#130
csvtk is just not a TSV tool, it mostly works, but it disobeys the standard.
We can now use `csvtk fix-quotes` and `csvtk del-quotes` to work around quoting issues (e.g. internal quotes in the submitter.affiliation). Copied commit from Zika repo: * nextstrain/zika#58
Tried to address the csvtk comment with c0b9e50 . Feel free to suggest other changes, otherwise I think this PR is ready for review. |
Thanks for porting over the fix for |
Resolves #39
Create a new accession field without the version number so that annotations do not need to be updated when the version number is updated.
Related issue(s)
Related to nextstrain/measles#24
Checklist