Skip to content

Commit

Permalink
Enable result validation for collect_list Spark function in the Fuzzer (
Browse files Browse the repository at this point in the history
#9361)

Summary:
In #9231, `collect_list` is added to the disable list of `duckQueryRunner`.
However, this is unnecessary because DuckDB does not have an aggregate function
named `collect_list`, hence it would not be compared against DuckDB. This
setting is redundant.

Other than this, the results verification of `collect_list` has been set to
`nullptr`, so its results are not verified. But we can use a custom array
verifier used by Presto's `array_agg` to check the results of itself.

Pull Request resolved: #9361

Reviewed By: xiaoxmeng

Differential Revision: D55744044

Pulled By: mbasmanova

fbshipit-source-id: a1a94c58b2a01463261775d8b6e08b65fd986d29
  • Loading branch information
liujiayi771 authored and facebook-github-bot committed Apr 4, 2024
1 parent e4f74d3 commit 3e702c7
Showing 1 changed file with 11 additions and 4 deletions.
15 changes: 11 additions & 4 deletions velox/functions/sparksql/fuzzer/SparkAggregationFuzzerTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
#include "velox/exec/fuzzer/AggregationFuzzerOptions.h"
#include "velox/exec/fuzzer/AggregationFuzzerRunner.h"
#include "velox/exec/fuzzer/DuckQueryRunner.h"
#include "velox/exec/fuzzer/TransformResultVerifier.h"
#include "velox/functions/prestosql/registration/RegistrationFunctions.h"
#include "velox/functions/sparksql/aggregates/Register.h"

DEFINE_int64(
Expand All @@ -48,13 +50,20 @@ int main(int argc, char** argv) {
// experience, and initialize glog and gflags.
folly::Init init(&argc, &argv);

facebook::velox::functions::prestosql::registerInternalFunctions();
facebook::velox::memory::MemoryManager::initialize({});

// TODO: List of the functions that at some point crash or fail and need to
// be fixed before we can enable. Constant argument of bloom_filter_agg cause
// fuzzer test fail.
std::unordered_set<std::string> skipFunctions = {"bloom_filter_agg"};

using facebook::velox::exec::test::TransformResultVerifier;

auto makeArrayVerifier = []() {
return TransformResultVerifier::create("\"$internal$canonicalize\"({})");
};

// The results of the following functions depend on the order of input
// rows. For some functions, the result can be transformed to a value that
// doesn't depend on the order of inputs. If such transformation exists, it
Expand All @@ -72,7 +81,8 @@ int main(int argc, char** argv) {
{"min_by", nullptr},
{"skewness", nullptr},
{"kurtosis", nullptr},
{"collect_list", nullptr}};
{"collect_list", makeArrayVerifier()},
};

size_t initialSeed = FLAGS_seed == 0 ? std::time(nullptr) : FLAGS_seed;
auto duckQueryRunner =
Expand All @@ -89,9 +99,6 @@ int main(int argc, char** argv) {
// coefficient. Meanwhile, DuckDB employs the sample kurtosis calculation
// formula. The results from the two methods are completely different.
"kurtosis",
// When all data in a group are null, Spark returns an empty array while
// DuckDB returns null.
"collect_list",
});

using Runner = facebook::velox::exec::test::AggregationFuzzerRunner;
Expand Down

0 comments on commit 3e702c7

Please sign in to comment.