-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add conditional xfail test for DISTINCT aggregates with NaN #261
Add conditional xfail test for DISTINCT aggregates with NaN #261
Conversation
SPARK-32038 reports a regression in Apache Spark (3.0.0), in failing to normalize NaN/Zero float values, during DISTINCT aggregations. This causes a mismatch in results between Apache Spark 3.0.0 on CPU, and the Rapids Accelerator (which returns the right results). SPARK-32038 was fixed in apache/spark#28876. This commit introduces a conditional xfail test that passes on Apache Spark 3.0.1 and 3.1+ (which fixes SPARK-32038), but produces an expected failure on Spark 3.0.0.
build |
For reference: xfail Documentation |
@pytest.mark.xfail(reason="count(distinct floats) fails when there are NaN values in the aggregation column." | ||
"(https://github.com/NVIDIA/spark-rapids/issues/194)") | ||
@pytest.mark.xfail( | ||
condition=with_spark_session(lambda spark : spark.sparkContext.version == "3.0.0"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great 👍
build |
Hmm... I'm not sure why the status checks are blocked. |
The build for branch-0.2 is not set up yet. Still waiting on the team to do it. |
D'oh! Thanks, @revans2. Permission to merge this? |
We have only been merging necessary doc changes. If it is still not up tomorrow then I will start manually testing/merging things. |
Gotcha. I mistakenly thought test code was acceptable. This patch will keep, so there isn't a rush. Thanks, @revans2. |
build |
1 similar comment
build |
SPARK-32038 reports a regression in Apache Spark (3.0.0), in failing to normalize NaN/Zero float values, during DISTINCT aggregations. This causes a mismatch in results between Apache Spark 3.0.0 on CPU, and the Rapids Accelerator (which returns the right results). SPARK-32038 was fixed in apache/spark#28876. This commit introduces a conditional xfail test that passes on Apache Spark 3.0.1 and 3.1+ (which fixes SPARK-32038), but produces an expected failure on Spark 3.0.0.
SPARK-32038 reports a regression in Apache Spark (3.0.0), in failing to normalize NaN/Zero float values, during DISTINCT aggregations. This causes a mismatch in results between Apache Spark 3.0.0 on CPU, and the Rapids Accelerator (which returns the right results). SPARK-32038 was fixed in apache/spark#28876. This commit introduces a conditional xfail test that passes on Apache Spark 3.0.1 and 3.1+ (which fixes SPARK-32038), but produces an expected failure on Spark 3.0.0.
* Cifar10 scaffold (#127) * SCAFFOLD Integration scaffold learner scaffold configs initialize the scaffold terms on the client update README to include SCAFFOLD description add scaffold experiment update urls add LICENSE from NIID-Bench update README and plots to reflect SCAFFOLD experiment refactor scaffold computation steps into their own member functions remove unused import refactor to use ScaffoldLearner class Scaffold learner depends on PyTorch. Renamed as such use two standard aggregators, inherit SCAFFOLD workflow from standard ScatterAndGather remove custom aggregator use update_shareable use multi-class inheritance for scaffold learner use new aggregator that supports several DXOs fix updating call for global model simplify dxo/shareable handling use built-in PTFileModelLocator fix formatting add PT Formatter use built-in validation json generator; remove custom formatter remove custom Json validation generator update license restore run_secure.sh restore project yml restore main branch learner executor add SCAFFOLD link to example readme remove custom validation json generator print model owner info during validation remove custom formatter remove special handling of validation dxo. Not needed anymore use zeros_like() to initialize scaffold terms * use scaffold helper class * formatting * make scaffold function args consistent * Add versioneer so nvflare.__version__ has meaningful value (#215) * Add versioneer so nvflare.__version__ has meaningful value * Fix coding style and license for versioneer * Add information about cross site validation, commands, and hostname to documentation (#206) * Add information about cross site validation to documentation * More minor documentation updates * Fix link * Add persistence to admin cli (#210) * Add persistence to admin cli * save to ~/.nvflare * 211 Update hello-monai example (#212) * update hello-monai Signed-off-by: Yiheng Wang <vennw@nvidia.com> * decrease aggregation_epochs Signed-off-by: Yiheng Wang <vennw@nvidia.com> * Initial implementation of Ditto algorithm (#144) Refactor ditto with multi inheritance Refactor ditto with multi inheritance Update README.md move learners to app_common with other minor updates update license section update for unit test update for unit test update for unit test remove redundant init parameters Update supervised_ditto_learner.py Update README.md Update README.md Update README.md Update README.md Update supervised_ditto_learner.py move learners back to custom folder, will further discuss and make the move in separate PR * Initial implementation for HA. * Added FL client HA support. * Enhanced the Client HA. * HA function working for server & client. * Working with SnapshotFilePersistor config. * Made the overseer_agent configurable. * Made the admin support HA. * Adjust retry. * Adjust admin prompt. * Added storage_state_persistor.py * Integrated with new overseer-agent. * Changed to use admin Overseer-agent. * Updated overseer_spec.py * Fixed end line. * Added storage and used the dev-2.1 cli.py. * Used dev-2.1 cli.py. Co-authored-by: Holger Roth <6304754+holgerroth@users.noreply.github.com> Co-authored-by: Isaac Yang <isaacy@nvidia.com> Co-authored-by: nvkevlu <55759229+nvkevlu@users.noreply.github.com> Co-authored-by: Sean Yang <seany314@gmail.com> Co-authored-by: Yiheng Wang <68361391+yiheng-wang-nv@users.noreply.github.com> Co-authored-by: Ziyue Xu <71786575+ZiyueXu77@users.noreply.github.com>
* moving rapids cmake include to pull from rapids version * moving rapids cmake include to pull from rapids version Signed-off-by: Mike Wilson <knobby@burntsheep.com>
SPARK-32038 reports a regression in Apache Spark (3.0.0), in
failing to normalize NaN/Zero float values, during DISTINCT
aggregations. This causes a mismatch in results between
Apache Spark 3.0.0 on CPU, and the Rapids Accelerator (which
returns the right results).
SPARK-32038 was fixed in apache/spark#28876.
The following change introduces a conditional xfail test that passes
on Apache Spark 3.0.1 and 3.1+ (which fixes SPARK-32038),
but produces an expected failure on Spark 3.0.0.
Testing: