From 696517dde693b3b3a838b7c98e2eb5fe342c4488 Mon Sep 17 00:00:00 2001 From: Victor Lin <13424970+victorlin@users.noreply.github.com> Date: Tue, 27 Jun 2023 11:32:23 -0700 Subject: [PATCH] filter: Add another test for numerical queries The existing test was good for comparisons between columns and numerical constants, but did not cover comparisons between two numerical columns. --- .../filter/cram/filter-query-numerical.t | 25 +++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/tests/functional/filter/cram/filter-query-numerical.t b/tests/functional/filter/cram/filter-query-numerical.t index fd5865fb6..a3e7270ef 100644 --- a/tests/functional/filter/cram/filter-query-numerical.t +++ b/tests/functional/filter/cram/filter-query-numerical.t @@ -33,3 +33,28 @@ The 'category' column will fail when used with a numerical comparison. '>=' not supported between instances of 'str' and 'float' Ensure the syntax is valid per . [2] + +Create another metadata file for testing. + + $ cat >metadata.tsv <<~~ + > strain metric1 metric2 + > SEQ1 4 5 + > SEQ2 5 9 + > SEQ3 6 10 + > ~~ + +Use a Pandas query to filter by a numerical value. +This relies on having proper data types associated with the columns. If < is +comparing strings, it's likely that SEQ3 will be dropped or errors arise. + + $ ${AUGUR} filter \ + > --metadata metadata.tsv \ + > --query "metric1 > 4 & metric1 < metric2" \ + > --output-strains filtered_strains.txt + 1 strains were dropped during filtering + \t1 of these were filtered out by the query: "metric1 > 4 & metric1 < metric2" (esc) + 2 strains passed all filters + + $ sort filtered_strains.txt + SEQ2 + SEQ3