[Audit] Add bucketed scan info in query plan of data source v1 [databricks] #4461

HaoYang670 · 2022-01-05T09:20:20Z

Signed-off-by: remzi 13716567376yh@gmail.com
close #3952

update GpuFileSourceScanExec.scala to include the change of spark apache/spark@79515e4b6c
add an integration test
update shim layers from spark31X

Signed-off-by: remzi <13716567376yh@gmail.com>

add shim layer Signed-off-by: remzi <13716567376yh@gmail.com>

HaoYang670 · 2022-01-05T09:21:04Z

build

revans2

Just some nits. But, we need to understand and fix the failing test. We also should run these against databricks so we are sure it works there too.

revans2 · 2022-01-05T13:52:33Z

integration_tests/src/main/python/explain_test.py

+        assert "Bucketed: false (disabled by query planner)" in df._sc._jvm.PythonSQLUtils.explainString(df._jdf.queryExecution(), "simple")
+
+
+    with_gpu_session(bucket_column_not_read)


Why have 1 test for all of these instead of 4 separate tests? It does not look like you are saving any code space or anything like that.

Thank you for your feedback! I will split it.

There is one failing test because the Spark version is before 3.1.0. I will add a skipif mark to the test.

revans2 · 2022-01-05T14:05:19Z

integration_tests/src/main/python/explain_test.py

+@allow_non_gpu(any=True)
+def test_explain_bucketed_scan(spark_tmp_table_factory):
+    """
+    https://github.com/NVIDIA/spark-rapids/issues/3952


A text explanation would be good to be able to get a high level idea of what is happening.

Thank you for the feedback! I will add comments.

Signed-off-by: remzi <13716567376yh@gmail.com>

HaoYang670 · 2022-01-06T03:48:34Z

build

HaoYang670 · 2022-01-07T01:39:07Z

Solve merge conflicts because shim layers are refactored.

Signed-off-by: remzi <13716567376yh@gmail.com>

HaoYang670 · 2022-01-07T02:05:37Z

build

revans2

It looks like some of the files might need to be updated for 2022 copyright. Other than that it looks good.

HaoYang670 · 2022-01-10T01:18:58Z

Thank you. I will update them

Signed-off-by: remzi <13716567376yh@gmail.com>

HaoYang670 · 2022-01-10T02:45:53Z

build

Signed-off-by: remzi <13716567376yh@gmail.com>

HaoYang670 · 2022-01-10T03:19:51Z

build

HaoYang670 added 5 commits December 31, 2021 11:11

temp save

8f473d2

Signed-off-by: remzi <13716567376yh@gmail.com>

add test

caa4918

Signed-off-by: remzi <13716567376yh@gmail.com>

copy from spark

6a9f0da

Signed-off-by: remzi <13716567376yh@gmail.com>

Merge branch 'branch-22.02' into issue3952_add_bucketed_scan_info

e531763

add test

57cfd39

add shim layer Signed-off-by: remzi <13716567376yh@gmail.com>

HaoYang670 added the audit_3.3.0 Audit related tasks for 3.3.0 label Jan 5, 2022

revans2 reviewed Jan 5, 2022

View reviewed changes

HaoYang670 added 2 commits January 6, 2022 11:02

split the test and add comments

4763a1e

Signed-off-by: remzi <13716567376yh@gmail.com>

fix a spelling mistake

b6ead70

Signed-off-by: remzi <13716567376yh@gmail.com>

HaoYang670 changed the title ~~[Audit] [FEA][SPARK-32986][SQL] Add bucketed scan info in query plan of data source v1~~ [Audit] Add bucketed scan info in query plan of data source v1[databricks] Jan 6, 2022

HaoYang670 changed the title ~~[Audit] Add bucketed scan info in query plan of data source v1[databricks]~~ [Audit] Add bucketed scan info in query plan of data source v1 [databricks] Jan 6, 2022

Merge branch 'branch-22.02' into issue3952_add_bucketed_scan_info

1c2c818

HaoYang670 marked this pull request as draft January 7, 2022 01:40

update shim layer of 320 plus

bac4bad

Signed-off-by: remzi <13716567376yh@gmail.com>

HaoYang670 marked this pull request as ready for review January 7, 2022 05:00

HaoYang670 requested a review from revans2 January 7, 2022 05:22

revans2 reviewed Jan 7, 2022

View reviewed changes

sameerz added this to the Jan 10 - Jan 28 milestone Jan 9, 2022

update to 2022 copyright

9fde276

Signed-off-by: remzi <13716567376yh@gmail.com>

HaoYang670 closed this Jan 10, 2022

HaoYang670 reopened this Jan 10, 2022

update copyright

a1fd526

Signed-off-by: remzi <13716567376yh@gmail.com>

HaoYang670 requested a review from revans2 January 10, 2022 06:21

revans2 approved these changes Jan 11, 2022

View reviewed changes

revans2 merged commit 9446618 into NVIDIA:branch-22.02 Jan 11, 2022

HaoYang670 deleted the issue3952_add_bucketed_scan_info branch January 12, 2022 02:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Audit] Add bucketed scan info in query plan of data source v1 [databricks] #4461

[Audit] Add bucketed scan info in query plan of data source v1 [databricks] #4461

HaoYang670 commented Jan 5, 2022

HaoYang670 commented Jan 5, 2022

revans2 left a comment

revans2 Jan 5, 2022

HaoYang670 Jan 6, 2022

HaoYang670 Jan 6, 2022

revans2 Jan 5, 2022

HaoYang670 Jan 6, 2022

HaoYang670 commented Jan 6, 2022

HaoYang670 commented Jan 7, 2022

HaoYang670 commented Jan 7, 2022

revans2 left a comment

HaoYang670 commented Jan 10, 2022

HaoYang670 commented Jan 10, 2022

HaoYang670 commented Jan 10, 2022

		assert "Bucketed: false (disabled by query planner)" in df._sc._jvm.PythonSQLUtils.explainString(df._jdf.queryExecution(), "simple")


		with_gpu_session(bucket_column_not_read)

[Audit] Add bucketed scan info in query plan of data source v1 [databricks] #4461

[Audit] Add bucketed scan info in query plan of data source v1 [databricks] #4461

Conversation

HaoYang670 commented Jan 5, 2022

HaoYang670 commented Jan 5, 2022

revans2 left a comment

Choose a reason for hiding this comment

revans2 Jan 5, 2022

Choose a reason for hiding this comment

HaoYang670 Jan 6, 2022

Choose a reason for hiding this comment

HaoYang670 Jan 6, 2022

Choose a reason for hiding this comment

revans2 Jan 5, 2022

Choose a reason for hiding this comment

HaoYang670 Jan 6, 2022

Choose a reason for hiding this comment

HaoYang670 commented Jan 6, 2022

HaoYang670 commented Jan 7, 2022

HaoYang670 commented Jan 7, 2022

revans2 left a comment

Choose a reason for hiding this comment

HaoYang670 commented Jan 10, 2022

HaoYang670 commented Jan 10, 2022

HaoYang670 commented Jan 10, 2022