Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

augur curate rename: Add option to drop fields/columns #1526

Open
joverlee521 opened this issue Jul 8, 2024 · 4 comments
Open

augur curate rename: Add option to drop fields/columns #1526

joverlee521 opened this issue Jul 8, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@joverlee521
Copy link
Contributor

Context

Originally implemented in augur curate rename but dropped due to discussion on UI in #1506 (comment).

There are many cases where metadata includes extra fields/columns that should be dropped. We currently do this with tsv-select in the pathogen-repo-guide but it would be nice to have this built into augur curate rename.

Possible solutions

  1. Use existing --field-map option to define "empty" renames ( e.g. --field-map X= drops field X) originally implemented in 650bd56
  2. Add an explicit --drop-fields option
  3. Any fields/columns not provided via --field-map get automatically dropped (this is based on the config file idea proposed in Metadata wrangling prior to merging with existing data #1475 where every column must be accounted for in the command).
@jameshadfield
Copy link
Member

I implemented the --field-map X= syntax partly based on how git (used to) work when deleting remote branches, and I personally find it nice and concise. But git introduced --delete <branch> so perhaps they'd vote for --drop-fields; then again they described it as syntactic sugar so perhaps not 🤷

Any fields/columns not provided via --field-map get automatically dropped (this is based on the config file idea ...

At a high level I do want a curate pipeline to be able to detect if new columns are in the input data, but I don't think forcing us to list them all out within curate rename is the right place.

@genehack
Copy link
Contributor

I implemented the --field-map X= syntax partly based on how git (used to) work when deleting remote branches, and I personally find it nice and concise. But git introduced --delete <branch> so perhaps they'd vote for --drop-fields

note: strong opinions, weakly held follow

To me, the difference between --field-map X= and --drop-fields X comes down to least-surprise and clear expression.

--drop-fields X is unambiguous: if somebody writes out that flag, they're expecting field X to go away.

Conversely, --field-map X= could be intentional, or it could be somebody meant to come along later and fill in a Y on the RHS of that = but they forgot (or even, they're generating the field map programmatically, and they've botched something, and they're expecting a field name on the RHS but got the empty string instead). I can imagine that troubleshooting something like that in the middle of a big pipeline is going to be fairly frustrating.

@joverlee521
Copy link
Contributor Author

+1 for the more explicit option --drop-fields

(I've always found the Git CLI to be confusing...)

@tsibley
Copy link
Member

tsibley commented Sep 10, 2024

I think we're going to want two behaviours available depending on the situation:

  1. an explicit list of fields to drop, e.g. --drop-fields x y z, and
  2. a catchall "drop everything not explicitly mentioned", e.g. --drop-others or something

I don't think rename should be dropping fields, though. That'd be surprising to me. We could rename rename to select (or similar) and adjust the interface a little or leave rename and add select (or similar) and then either way I think it makes sense and isn't surprising.

augur curate select --fields id a=x b=y
augur curate rename --field-map a=x b=y | augur curate select --fields id x y

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants