Skip to content

Commit

Permalink
Use metadata-only filtering for subsampling
Browse files Browse the repository at this point in the history
Replaces FASTA outputs with strain list outputs for the subsample rule
such that sequence data are not inspected during most subsampling steps.
The exception to the rule are subsampling jobs that require a priority
score calculation that depends on the FASTA sequence of another
subsampled group. To handle this exception, we add a new rule to extract
just those subsampled sequences.

Finally, we collect subsampled sequences into a single deduplicated
FASTA output using augur filter's new interface with the `--exclude-all`
flag and multiple input support for `--include`.

Note that this commit also updates the conda environment to use a GitHub
branch instead of an official augur release.
  • Loading branch information
huddlej committed Apr 14, 2021
1 parent 70707be commit 102a0e2
Showing 1 changed file with 0 additions and 8 deletions.
8 changes: 0 additions & 8 deletions workflow/snakemake_rules/main_workflow.smk
Original file line number Diff line number Diff line change
Expand Up @@ -355,14 +355,12 @@ rule subsample:
- priority: {params.priority_argument}
"""
input:
sequences = _get_unified_alignment,
metadata = _get_unified_metadata,
sequence_index = rules.index_sequences.output.sequence_index,
include = config["files"]["include"],
priorities = get_priorities,
exclude = config["files"]["exclude"]
output:
sequences = "results/{build_name}/sample-{subsample}.fasta",
strains="results/{build_name}/sample-{subsample}.txt",
log:
"logs/subsample_{build_name}_{subsample}.txt"
Expand All @@ -387,7 +385,6 @@ rule subsample:
shell:
"""
augur filter \
--sequences {input.sequences} \
--metadata {input.metadata} \
--sequence-index {input.sequence_index} \
--include {input.include} \
Expand All @@ -403,7 +400,6 @@ rule subsample:
{params.sequences_per_group} \
{params.subsample_max_sequences} \
{params.sampling_scheme} \
--output {output.sequences} \
--output-strains {output.strains} 2>&1 | tee {log}
"""

Expand Down Expand Up @@ -468,10 +464,6 @@ def _get_subsampled_files(wildcards):
]

rule combine_samples:
message:
"""
Combine and deduplicate FASTAs
"""
input:
sequences=_get_unified_alignment,
sequence_index=rules.index_sequences.output.sequence_index,
Expand Down

0 comments on commit 102a0e2

Please sign in to comment.