Add conditional xfail test for DISTINCT aggregates with NaN #261

mythrocks · 2020-06-23T04:51:17Z

SPARK-32038 reports a regression in Apache Spark (3.0.0), in
failing to normalize NaN/Zero float values, during DISTINCT
aggregations. This causes a mismatch in results between
Apache Spark 3.0.0 on CPU, and the Rapids Accelerator (which
returns the right results).
SPARK-32038 was fixed in apache/spark#28876.

The following change introduces a conditional xfail test that passes
on Apache Spark 3.0.1 and 3.1+ (which fixes SPARK-32038),
but produces an expected failure on Spark 3.0.0.

Testing:

On Apache Spark 3.0.0:

SPARK_HOME=~/workspace/dev/spark/3.0.0 ./run_pyspark_from_build.sh  -k hash -s

src/main/python/hash_aggregate_test.py::test_count_distinct_with_nan_floats[[('a', RepeatSeq), ('b', Float), ('c', Long)]][IGNORE_ORDER, APPROXIMATE_FLOAT] XFAIL [100%]
...
XFAIL src/main/python/hash_aggregate_test.py::test_count_distinct_with_nan_floats[[('a', RepeatSeq), ('b', Float), ('c', Long)]][IGNORE_ORDER, APPROXIMATE_FLOAT]
  [SPARK-32038][SQL] NormalizeFloatingNumbers should also work on distinct aggregate (https://github.com/apache/spark/pull/28876) Fixed in later Apache Spark releases.

On Apache Spark 3.0.1:

SPARK_HOME=~/workspace/dev/spark/3.0.1 ./run_pyspark_from_build.sh  -k hash -s
...
src/main/python/hash_aggregate_test.py::test_count_distinct_with_nan_floats[[('a', RepeatSeq), ('b', Float), ('c', Long)]][IGNORE_ORDER, APPROXIMATE_FLOAT] PASSED [100%]

SPARK-32038 reports a regression in Apache Spark (3.0.0), in failing to normalize NaN/Zero float values, during DISTINCT aggregations. This causes a mismatch in results between Apache Spark 3.0.0 on CPU, and the Rapids Accelerator (which returns the right results). SPARK-32038 was fixed in apache/spark#28876. This commit introduces a conditional xfail test that passes on Apache Spark 3.0.1 and 3.1+ (which fixes SPARK-32038), but produces an expected failure on Spark 3.0.0.

mythrocks · 2020-06-23T04:52:10Z

build

mythrocks · 2020-06-23T04:56:29Z

For reference: xfail Documentation
The initial intention was to introduce an imperative xfail, based on the Apache Spark version string. But this would not execute any code after the xfail invocation.
Adding it to the @pytest.mark.xfail() lets the test run on Spark 3.0.0, and then excuses the failure. This way, we can look out for a future xpass, in case SPARK-32038 is ported back to Spark 3.0.0.

abellina · 2020-06-23T05:31:37Z

integration_tests/src/main/python/hash_aggregate_test.py

-@pytest.mark.xfail(reason="count(distinct floats) fails when there are NaN values in the aggregation column."
-                          "(https://github.com/NVIDIA/spark-rapids/issues/194)")
+@pytest.mark.xfail(
+    condition=with_spark_session(lambda spark : spark.sparkContext.version == "3.0.0"),


This is great 👍

mythrocks · 2020-06-23T21:07:27Z

build

mythrocks · 2020-06-23T21:24:11Z

Hmm... I'm not sure why the status checks are blocked.

revans2 · 2020-06-23T21:26:46Z

Hmm... I'm not sure why the status checks are blocked.

The build for branch-0.2 is not set up yet. Still waiting on the team to do it.

mythrocks · 2020-06-23T21:30:05Z

not sure why the status checks are blocked.
The build for branch-0.2 is not set up yet.

D'oh! Thanks, @revans2. Permission to merge this?

revans2 · 2020-06-23T21:33:04Z

We have only been merging necessary doc changes. If it is still not up tomorrow then I will start manually testing/merging things.

mythrocks · 2020-06-23T21:36:04Z

We have only been merging necessary doc changes.

Gotcha. I mistakenly thought test code was acceptable. This patch will keep, so there isn't a rush. Thanks, @revans2.

revans2 · 2020-06-24T16:48:57Z

build

revans2 · 2020-06-24T17:16:20Z

build

SPARK-32038 reports a regression in Apache Spark (3.0.0), in failing to normalize NaN/Zero float values, during DISTINCT aggregations. This causes a mismatch in results between Apache Spark 3.0.0 on CPU, and the Rapids Accelerator (which returns the right results). SPARK-32038 was fixed in apache/spark#28876. This commit introduces a conditional xfail test that passes on Apache Spark 3.0.1 and 3.1+ (which fixes SPARK-32038), but produces an expected failure on Spark 3.0.0.

* Cifar10 scaffold (#127) * SCAFFOLD Integration scaffold learner scaffold configs initialize the scaffold terms on the client update README to include SCAFFOLD description add scaffold experiment update urls add LICENSE from NIID-Bench update README and plots to reflect SCAFFOLD experiment refactor scaffold computation steps into their own member functions remove unused import refactor to use ScaffoldLearner class Scaffold learner depends on PyTorch. Renamed as such use two standard aggregators, inherit SCAFFOLD workflow from standard ScatterAndGather remove custom aggregator use update_shareable use multi-class inheritance for scaffold learner use new aggregator that supports several DXOs fix updating call for global model simplify dxo/shareable handling use built-in PTFileModelLocator fix formatting add PT Formatter use built-in validation json generator; remove custom formatter remove custom Json validation generator update license restore run_secure.sh restore project yml restore main branch learner executor add SCAFFOLD link to example readme remove custom validation json generator print model owner info during validation remove custom formatter remove special handling of validation dxo. Not needed anymore use zeros_like() to initialize scaffold terms * use scaffold helper class * formatting * make scaffold function args consistent * Add versioneer so nvflare.__version__ has meaningful value (#215) * Add versioneer so nvflare.__version__ has meaningful value * Fix coding style and license for versioneer * Add information about cross site validation, commands, and hostname to documentation (#206) * Add information about cross site validation to documentation * More minor documentation updates * Fix link * Add persistence to admin cli (#210) * Add persistence to admin cli * save to ~/.nvflare * 211 Update hello-monai example (#212) * update hello-monai Signed-off-by: Yiheng Wang <vennw@nvidia.com> * decrease aggregation_epochs Signed-off-by: Yiheng Wang <vennw@nvidia.com> * Initial implementation of Ditto algorithm (#144) Refactor ditto with multi inheritance Refactor ditto with multi inheritance Update README.md move learners to app_common with other minor updates update license section update for unit test update for unit test update for unit test remove redundant init parameters Update supervised_ditto_learner.py Update README.md Update README.md Update README.md Update README.md Update supervised_ditto_learner.py move learners back to custom folder, will further discuss and make the move in separate PR * Initial implementation for HA. * Added FL client HA support. * Enhanced the Client HA. * HA function working for server & client. * Working with SnapshotFilePersistor config. * Made the overseer_agent configurable. * Made the admin support HA. * Adjust retry. * Adjust admin prompt. * Added storage_state_persistor.py * Integrated with new overseer-agent. * Changed to use admin Overseer-agent. * Updated overseer_spec.py * Fixed end line. * Added storage and used the dev-2.1 cli.py. * Used dev-2.1 cli.py. Co-authored-by: Holger Roth <6304754+holgerroth@users.noreply.github.com> Co-authored-by: Isaac Yang <isaacy@nvidia.com> Co-authored-by: nvkevlu <55759229+nvkevlu@users.noreply.github.com> Co-authored-by: Sean Yang <seany314@gmail.com> Co-authored-by: Yiheng Wang <68361391+yiheng-wang-nv@users.noreply.github.com> Co-authored-by: Ziyue Xu <71786575+ZiyueXu77@users.noreply.github.com>

* moving rapids cmake include to pull from rapids version * moving rapids cmake include to pull from rapids version Signed-off-by: Mike Wilson <knobby@burntsheep.com>

mythrocks self-assigned this Jun 23, 2020

mythrocks added the test Only impacts tests label Jun 23, 2020

mythrocks added this to the Jun 22 - Jul 2 milestone Jun 23, 2020

mythrocks linked an issue Jun 23, 2020 that may be closed by this pull request

[BUG] count(distinct float_col) produces different results from CPU, for Float columns with NaNs #194

Closed

mythrocks mentioned this pull request Jun 23, 2020

[BUG] count(distinct float_col) produces different results from CPU, for Float columns with NaNs #194

Closed

abellina reviewed Jun 23, 2020

View reviewed changes

abellina approved these changes Jun 23, 2020

View reviewed changes

revans2 approved these changes Jun 23, 2020

View reviewed changes

revans2 merged commit fab7f9a into NVIDIA:branch-0.2 Jun 24, 2020

mythrocks deleted the xfail-distinct-aggregates-spark-28876 branch February 27, 2023 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add conditional xfail test for DISTINCT aggregates with NaN #261

Add conditional xfail test for DISTINCT aggregates with NaN #261

mythrocks commented Jun 23, 2020 •

edited

Loading

mythrocks commented Jun 23, 2020

mythrocks commented Jun 23, 2020

abellina Jun 23, 2020

mythrocks commented Jun 23, 2020

mythrocks commented Jun 23, 2020

revans2 commented Jun 23, 2020

mythrocks commented Jun 23, 2020

revans2 commented Jun 23, 2020

mythrocks commented Jun 23, 2020

revans2 commented Jun 24, 2020

revans2 commented Jun 24, 2020

Add conditional xfail test for DISTINCT aggregates with NaN #261

Add conditional xfail test for DISTINCT aggregates with NaN #261

Conversation

mythrocks commented Jun 23, 2020 • edited Loading

mythrocks commented Jun 23, 2020

mythrocks commented Jun 23, 2020

abellina Jun 23, 2020

Choose a reason for hiding this comment

mythrocks commented Jun 23, 2020

mythrocks commented Jun 23, 2020

revans2 commented Jun 23, 2020

mythrocks commented Jun 23, 2020

revans2 commented Jun 23, 2020

mythrocks commented Jun 23, 2020

revans2 commented Jun 24, 2020

revans2 commented Jun 24, 2020

mythrocks commented Jun 23, 2020 •

edited

Loading