Skip to content

Commit

Permalink
Make the choice of "accession" on the config level
Browse files Browse the repository at this point in the history
There was useful information on why "accession" is used over "strain" in
the docstring of update_example_data. Update that rule to use the
strain_id_field config, and move the context to every config file that
sets strain_id_field as "accession".

Don't set defaults when retrieving strain_id_field so "accession" is
only set on the config level.
  • Loading branch information
victorlin committed Jul 19, 2023
1 parent 4d8c515 commit 2b57ce5
Show file tree
Hide file tree
Showing 5 changed files with 14 additions and 9 deletions.
2 changes: 2 additions & 0 deletions config/config_hmpxv1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ auspice_config: "config/auspice_config_hmpxv1.json"
description: "config/description.md"
tree_mask: "config/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/monkeypox/issues/33
strain_id_field: "accession"
display_strain_field: "strain"

Expand Down
2 changes: 2 additions & 0 deletions config/config_hmpxv1_big.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ auspice_config: "config/auspice_config_hmpxv1_big.json"
description: "config/description.md"
tree_mask: "config/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/monkeypox/issues/33
strain_id_field: "accession"
display_strain_field: "strain"

Expand Down
2 changes: 2 additions & 0 deletions config/config_mpxv.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ description: "config/description.md"
clades: "config/clades.tsv"
tree_mask: "config/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/monkeypox/issues/33
strain_id_field: "accession"
display_strain_field: "strain"

Expand Down
7 changes: 3 additions & 4 deletions workflow/snakemake_rules/chores.smk
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,6 @@ rule update_example_data:
- sets the subsampling size to 50
- includes the root (defined in config but hardcoded here)
- ensures all clades and lineages are accounted for using --group-by
- uses `accession` as the ID column since `strain` currently contains duplicates
TODO: Use `strain` as the ID column after https://github.com/nextstrain/monkeypox/issues/33 is done.
"""
message:
"Update example data"
Expand All @@ -17,11 +14,13 @@ rule update_example_data:
output:
sequences="example_data/sequences.fasta",
metadata="example_data/metadata.tsv",
params:
strain_id=lambda w: config.get("strain_id_field"),
shell:
"""
augur filter \
--metadata {input.metadata} \
--metadata-id-columns accession \
--metadata-id-columns {params.strain_id} \
--sequences {input.sequences} \
--include-where strain=MK783032 strain=MK783030 \
--group-by clade lineage \
Expand Down
10 changes: 5 additions & 5 deletions workflow/snakemake_rules/core.smk
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ rule exclude_bad:
params:
min_date=config["min_date"],
min_length=config["min_length"],
strain_id=lambda w: config.get("strain_id_field", "accession"),
strain_id=lambda w: config.get("strain_id_field"),
shell:
"""
augur filter \
Expand Down Expand Up @@ -81,7 +81,7 @@ rule filter:
group_by=config.get("group_by", "--group-by clade lineage"),
sequences_per_group=config["sequences_per_group"],
other_filters=config.get("filters", ""),
strain_id=lambda w: config.get("strain_id_field", "accession"),
strain_id=lambda w: config.get("strain_id_field"),
shell:
"""
augur filter \
Expand Down Expand Up @@ -236,7 +236,7 @@ rule refine:
clock_std_dev=lambda w: f"--clock-std-dev {config['clock_std_dev']}"
if "clock_std_dev" in config
else "",
strain_id=lambda w: config.get("strain_id_field", "accession"),
strain_id=lambda w: config.get("strain_id_field"),
shell:
"""
augur refine \
Expand Down Expand Up @@ -312,7 +312,7 @@ rule traits:
params:
columns="country",
sampling_bias_correction=3,
strain_id=lambda w: config.get("strain_id_field", "accession"),
strain_id=lambda w: config.get("strain_id_field"),
shell:
"""
augur traits \
Expand Down Expand Up @@ -444,7 +444,7 @@ rule export:
auspice_json=build_dir + "/{build_name}/raw_tree.json",
root_sequence=build_dir + "/{build_name}/raw_tree_root-sequence.json",
params:
strain_id=lambda w: config.get("strain_id_field", "accession"),
strain_id=lambda w: config.get("strain_id_field"),
shell:
"""
augur export v2 \
Expand Down

0 comments on commit 2b57ce5

Please sign in to comment.