Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support aggregates when ANSI mode is enabled #5114

Open
nartal1 opened this issue Mar 31, 2022 · 0 comments
Open

[FEA] Support aggregates when ANSI mode is enabled #5114

nartal1 opened this issue Mar 31, 2022 · 0 comments
Labels
feature request New feature or request

Comments

@nartal1
Copy link
Collaborator

nartal1 commented Mar 31, 2022

Is your feature request related to a problem? Please describe.
We are currently falling back to CPU for aggregates if ANSI mode is enabled - #3597 .
This issue to track enabling of aggregates in ANSI mode.

While working on this, we have to look into different versions of Spark i.e 3.1, 3.2 etc to make sure we enable types in those versions only.
For example: sum(apache/spark@12abfe7917) and average(apache/spark@8dc455bba8) for interval types was added in Spark-3.2

@nartal1 nartal1 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Mar 31, 2022
@nartal1 nartal1 mentioned this issue Mar 31, 2022
49 tasks
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Apr 5, 2022
mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Jun 17, 2024
Fixes NVIDIA#11019.

Window function tests fail on Spark 4.0 because of NVIDIA#5114 (and NVIDIA#5120 broadly),
because spark-rapids does not support SUM, COUNT, and certain other aggregations
in ANSI mode.

This commit disables ANSI mode tests for the failing window function tests. These may be
revisited, once error/overflow checking is available for ANSI mode in spark-rapids.

Signed-off-by: MithunR <mithunr@nvidia.com>
razajafri pushed a commit that referenced this issue Jun 26, 2024
* Disable ANSI mode for window function tests.

Fixes #11019.

Window function tests fail on Spark 4.0 because of #5114 (and #5120 broadly),
because spark-rapids does not support SUM, COUNT, and certain other aggregations
in ANSI mode.

This commit disables ANSI mode tests for the failing window function tests. These may be
revisited, once error/overflow checking is available for ANSI mode in spark-rapids.

Signed-off-by: MithunR <mithunr@nvidia.com>

* Switch from @ansi_mode_disabled to @disable_ansi_mode.

---------

Signed-off-by: MithunR <mithunr@nvidia.com>
wjxiz1992 added a commit to nvliyuan/yuali-spark-rapids that referenced this issue Jun 26, 2024
* optimzing Expand+Aggregate in sqlw with many count distinct

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

* Add GpuBucketingUtils shim to Spark 4.0.0 (NVIDIA#11092)

* Add GpuBucketingUtils shim to Spark 4.0.0

* Signing off

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

---------

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* Improve the diagnostics for 'conv' fallback explain (NVIDIA#11076)

* Improve the diagnostics for 'conv' fallback explain

Signed-off-by: Jihoon Son <ghoonson@gmail.com>

* don't use nil

Signed-off-by: Jihoon Son <ghoonson@gmail.com>

* the bases should not be an empty string in the error message when the user input is not

Signed-off-by: Jihoon Son <ghoonson@gmail.com>

* more user-friendly message

* Update sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala

Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>

---------

Signed-off-by: Jihoon Son <ghoonson@gmail.com>
Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>

* Disable ANSI mode for window function tests [databricks] (NVIDIA#11073)

* Disable ANSI mode for window function tests.

Fixes NVIDIA#11019.

Window function tests fail on Spark 4.0 because of NVIDIA#5114 (and NVIDIA#5120 broadly),
because spark-rapids does not support SUM, COUNT, and certain other aggregations
in ANSI mode.

This commit disables ANSI mode tests for the failing window function tests. These may be
revisited, once error/overflow checking is available for ANSI mode in spark-rapids.

Signed-off-by: MithunR <mithunr@nvidia.com>

* Switch from @ansi_mode_disabled to @disable_ansi_mode.

---------

Signed-off-by: MithunR <mithunr@nvidia.com>

---------

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Jihoon Son <ghoonson@gmail.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Co-authored-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
Co-authored-by: Raza Jafri <razajafri@users.noreply.github.com>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>
Co-authored-by: MithunR <mithunr@nvidia.com>
SurajAralihalli pushed a commit to SurajAralihalli/spark-rapids that referenced this issue Jul 12, 2024
* Disable ANSI mode for window function tests.

Fixes NVIDIA#11019.

Window function tests fail on Spark 4.0 because of NVIDIA#5114 (and NVIDIA#5120 broadly),
because spark-rapids does not support SUM, COUNT, and certain other aggregations
in ANSI mode.

This commit disables ANSI mode tests for the failing window function tests. These may be
revisited, once error/overflow checking is available for ANSI mode in spark-rapids.

Signed-off-by: MithunR <mithunr@nvidia.com>

* Switch from @ansi_mode_disabled to @disable_ansi_mode.

---------

Signed-off-by: MithunR <mithunr@nvidia.com>
mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Jul 18, 2024
Most of the rest are borked because of exercising aggregations
like SUM, COUNT, AVG, etc. in ANSI mode.
NVIDIA#5114 sees to it that these aggregations fall to CPU.
mythrocks added a commit that referenced this issue Jul 18, 2024
)

* Fix hash-aggregate tests failing in ANSI mode

Fixes #11018.  

This commit fixes the tests in `hash_aggregate_test.py` to run correctly when run with ANSI enabled.  This is essential for running the tests with Spark 4.0, where ANSI mode is on by default.  

A vast majority of the tests here happen to exercise aggregations like `SUM`, `COUNT`, `AVG`, etc. which fall to CPU, on account of #5114.  These tests have been marked with `@disable_ansi_mode`, so that they run to completion correctly.  These may be revisited after #5114 has been addressed.  

In cases where #5114 does not apply, the tests have been modified to run with ANSI on and off.

---------

Signed-off-by: MithunR <mithunr@nvidia.com>
mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Sep 27, 2024
Fixes NVIDIA#11015.
Contributes to NVIDIA#11004.

This commit addresses the tests that fail in parquet_test.py, when
run on Spark 4.

1. Some of the tests were failing as a result of NVIDIA#5114.  Those tests
have been disabled, at least until we get around to supporting
aggregations with ANSI mode enabled.

2. `test_parquet_check_schema_compatibility` fails on Spark 4 regardless
of ANSI mode, because it tests implicit type promotions where the read
schema includes wider columns than the write schema.  This will require
new code.  The test is disabled until NVIDIA#11512 is addressed.

3. `test_parquet_int32_downcast` had an erroneous setup phase that fails
   in ANSI mode.  This has been corrected. The test was refactored to
run in ANSI and non-ANSI mode.

Signed-off-by: MithunR <mithunr@nvidia.com>
mythrocks added a commit that referenced this issue Oct 8, 2024
* Spark 4:  Fix parquet_test.py.

Fixes #11015. (Spark 4 failure.)
Also fixes #11531. (Databricks 14.3 failure.)
Contributes to #11004.

This commit addresses the tests that fail in parquet_test.py, when
run on Spark 4.

1. Some of the tests were failing as a result of #5114.  Those tests
have been disabled, at least until we get around to supporting
aggregations with ANSI mode enabled.

2. `test_parquet_check_schema_compatibility` fails on Spark 4 regardless
of ANSI mode, because it tests implicit type promotions where the read
schema includes wider columns than the write schema.  This will require
new code.  The test is disabled until #11512 is addressed.

3. `test_parquet_int32_downcast` had an erroneous setup phase that fails
   in ANSI mode.  This has been corrected. The test was refactored to
run in ANSI and non-ANSI mode.

Signed-off-by: MithunR <mithunr@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants