Skip to content

Commit

Permalink
[translate] update test syntax / format
Browse files Browse the repository at this point in the history
The move to "Creating files in the initial working directory" is
motivated by
<#1344 (comment)>
and <#1176>.

Additionally, I remove the pushd commands which were confusing (there
were multiple!) and use variables to refer to common directories to
improve readability.
  • Loading branch information
jameshadfield committed Dec 19, 2023
1 parent 02ed067 commit 0987264
Show file tree
Hide file tree
Showing 5 changed files with 50 additions and 43 deletions.
2 changes: 0 additions & 2 deletions tests/functional/translate/cram/_setup.sh

This file was deleted.

19 changes: 11 additions & 8 deletions tests/functional/translate/cram/translate-with-genbank.t
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
Setup

$ pushd "$TESTDIR" > /dev/null
$ source _setup.sh
$ export AUGUR="${AUGUR:-$TESTDIR/../../../../bin/augur}"
$ export DATA="$TESTDIR/../data"
$ export SCRIPTS="$TESTDIR/../../../../scripts"

Translate amino acids for genes using a GenBank file.

$ ${AUGUR} translate \
> --tree translate/data/zika/tree.nwk \
> --ancestral-sequences translate/data/zika/nt_muts.json \
> --reference-sequence translate/data/zika/zika_outgroup.gb \
> --tree "$DATA/zika/tree.nwk" \
> --ancestral-sequences "$DATA/zika/nt_muts.json" \
> --reference-sequence "$DATA/zika/zika_outgroup.gb" \
> --genes CA PRO \
> --output-node-data $TMP/aa_muts.json
Validating schema of 'translate/data/zika/nt_muts.json'...
> --output-node-data aa_muts.json
Validating schema of '.+nt_muts.json'... (re)
Read in 3 features from reference sequence file
amino acid mutations written to .* (re)
$ python3 "../../scripts/diff_jsons.py" translate/data/zika/aa_muts_genbank.json $TMP/aa_muts.json

$ python3 "$SCRIPTS/diff_jsons.py" $DATA/zika/aa_muts_genbank.json aa_muts.json \
> --exclude-regex-paths "root\['annotations'\]\['.+'\]\['seqid'\]"
{}
23 changes: 12 additions & 11 deletions tests/functional/translate/cram/translate-with-gff-and-gene-name.t
Original file line number Diff line number Diff line change
@@ -1,31 +1,32 @@
Setup

$ pushd "$TESTDIR" > /dev/null
$ source _setup.sh
$ export AUGUR="${AUGUR:-$TESTDIR/../../../../bin/augur}"
$ export DATA="$TESTDIR/../data"
$ export SCRIPTS="$TESTDIR/../../../../scripts"

Translate amino acids for genes using a GFF3 file where the gene names are stored in a qualifier named "gene_name".

$ cat >$TMP/genemap.gff <<~~
$ cat >genemap.gff <<~~
> ##gff-version 3
> ##sequence-region PF13/251013_18 1 10769
> PF13/251013_18 GenBank gene 91 456 . + . gene_name="CA"
> PF13/251013_18 GenBank gene 457 735 . + . gene_name="PRO"
> ~~

$ ${AUGUR} translate \
> --tree translate/data/zika/tree.nwk \
> --ancestral-sequences translate/data/zika/nt_muts.json \
> --reference-sequence "$TMP/genemap.gff" \
> --output-node-data $TMP/aa_muts.json
Validating schema of 'translate/data/zika/nt_muts.json'...
> --tree "${DATA}/zika/tree.nwk" \
> --ancestral-sequences "${DATA}/zika/nt_muts.json" \
> --reference-sequence "genemap.gff" \
> --output-node-data aa_muts.json
Validating schema of '.+/nt_muts.json'... (re)
Read in 2 features from reference sequence file
amino acid mutations written to .* (re)

Other than the sequence ids which will include a temporary path, the JSONs
should be identical.

$ python3 "../../scripts/diff_jsons.py" \
$ python3 "${SCRIPTS}/diff_jsons.py" \
> --exclude-regex-paths "['seqid']" -- \
> translate/data/zika/aa_muts_gff.json \
> $TMP/aa_muts.json
> "${DATA}/zika/aa_muts_gff.json" \
> aa_muts.json
{}
24 changes: 13 additions & 11 deletions tests/functional/translate/cram/translate-with-gff-and-gene.t
Original file line number Diff line number Diff line change
@@ -1,27 +1,29 @@
Setup

$ pushd "$TESTDIR" > /dev/null
$ source _setup.sh
$ export AUGUR="${AUGUR:-$TESTDIR/../../../../bin/augur}"
$ export DATA="$TESTDIR/../data"
$ export SCRIPTS="$TESTDIR/../../../../scripts"

Translate amino acids for genes using a GFF3 file where the gene names are stored in a qualifier named "gene".

$ cat >$TMP/genemap.gff <<~~
$ cat >genemap.gff <<~~
> ##gff-version 3
> ##sequence-region PF13/251013_18 1 10769
> PF13/251013_18 GenBank gene 91 456 . + . gene="CA"
> PF13/251013_18 GenBank gene 457 735 . + . gene="PRO"
> ~~

$ ${AUGUR} translate \
> --tree translate/data/zika/tree.nwk \
> --ancestral-sequences translate/data/zika/nt_muts.json \
> --reference-sequence "$TMP/genemap.gff" \
> --output-node-data $TMP/aa_muts.json
Validating schema of 'translate/data/zika/nt_muts.json'...
> --tree "${DATA}/zika/tree.nwk" \
> --ancestral-sequences "${DATA}/zika/nt_muts.json" \
> --reference-sequence genemap.gff \
> --output-node-data aa_muts.json
Validating schema of '.+/nt_muts.json'... (re)
Read in 2 features from reference sequence file
amino acid mutations written to .* (re)
$ python3 "../../scripts/diff_jsons.py" \

$ python3 "${SCRIPTS}/diff_jsons.py" \
> --exclude-regex-paths "['seqid']" -- \
> translate/data/zika/aa_muts_gff.json \
> $TMP/aa_muts.json
> "${DATA}/zika/aa_muts_gff.json" \
> aa_muts.json
{}
25 changes: 14 additions & 11 deletions tests/functional/translate/cram/translate-with-gff-and-locus-tag.t
Original file line number Diff line number Diff line change
@@ -1,23 +1,26 @@
Setup

$ pushd "$TESTDIR" > /dev/null
$ source _setup.sh
$ export AUGUR="${AUGUR:-$TESTDIR/../../../../bin/augur}"
$ export DATA="$TESTDIR/../data"
$ export SCRIPTS="$TESTDIR/../../../../scripts"

Translate amino acids for genes using a GFF3 file where the gene names are stored in a qualifier named "locus_tag".

$ ${AUGUR} translate \
> --tree translate/data/tb/tree.nwk \
> --genes translate/data/tb/genes.txt \
> --vcf-reference translate/data/tb/ref.fasta \
> --ancestral-sequences translate/data/tb/nt_muts.vcf \
> --reference-sequence translate/data/tb/Mtb_H37Rv_NCBI_Annot.gff \
> --output-node-data $TMP/aa_muts.json \
> --alignment-output $TMP/translations.vcf \
> --vcf-reference-output $TMP/translations_reference.fasta
> --tree "${DATA}/tb/tree.nwk" \
> --genes "${DATA}/tb/genes.txt" \
> --vcf-reference "${DATA}/tb/ref.fasta" \
> --ancestral-sequences "${DATA}/tb/nt_muts.vcf" \
> --reference-sequence "${DATA}/tb/Mtb_H37Rv_NCBI_Annot.gff" \
> --output-node-data aa_muts.json \
> --alignment-output translations.vcf \
> --vcf-reference-output translations_reference.fasta
Gene length of rrs_Rvnr01 is not a multiple of 3. will pad with N
Read in 187 specified genes to translate.
Read in 187 features from reference sequence file
162 genes had no mutations and so have been be excluded.
amino acid mutations written to .* (re)
$ python3 "../../scripts/diff_jsons.py" translate/data/tb/aa_muts.json $TMP/aa_muts.json

$ python3 "${SCRIPTS}/diff_jsons.py" "${DATA}/tb/aa_muts.json" aa_muts.json \
> --exclude-regex-paths "root\['annotations'\]\['.+'\]\['seqid'\]"
{}

0 comments on commit 0987264

Please sign in to comment.