Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify DPP over LIKE ANY/ALL expression #5695

Merged
merged 4 commits into from
Jun 20, 2022

Conversation

sperlingxx
Copy link
Collaborator

@sperlingxx sperlingxx commented May 31, 2022

Closes #2031

Check DPP over LIKE ANY/ALL working correctly on the GPU. The GPU plan stack is correct, since the reuse of GPU broadcast occurs as expected. The LikeAny tests are skipped in premerge ci so far, because the feature is added in Spark 3.1.2+. In addition, the dpp tests are reorganized to be conciser.

GpuColumnarToRow false
+- GpuProject [key#92, skey#93, value#91]
   +- GpuBroadcastHashJoin [key#92], [key#94], Inner, GpuBuildRight
      :- GpuFileGpuScan orc default.tmp_table_master_488800073_0[value#91,key#92,skey#93] Batched: true, DataFilters: [], Format: ORC, Location: InMemoryFileIndex[file:/home/alfred/workspace/codes/spark-rapids/integration_tests/target/run_dir..., PartitionFilters: [isnotnull(key#92), dynamicpruningexpression(key#92 IN dynamicpruning#105)], PushedFilters: [], ReadSchema: struct<value:int>
      :     +- GpuSubqueryBroadcast dynamicpruning#105, 0, [key#94]
      :        +- GpuBroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#1134]
      :           +- GpuProject [key#94]
      :              +- GpuRowToColumnar targetsize(104857600)
      :                 +- *(1) Filter (((((Contains(filter#95, 00) OR Contains(filter#95, 01)) OR Contains(filter#95, 10)) OR Contains(filter#95, 11)) OR likeany(filter#95)) AND isnotnull(key#94))
      :                    +- GpuColumnarToRow false
      :                       +- GpuFileGpuScan orc default.tmp_table_master_488800073_1[key#94,filter#95] Batched: true, DataFilters: [((((Contains(filter#95, 00) OR Contains(filter#95, 01)) OR Contains(filter#95, 10)) OR Contains(..., Format: ORC, Location: InMemoryFileIndex[file:/home/alfred/workspace/codes/spark-rapids/integration_tests/target/run_dir..., PartitionFilters: [], PushedFilters: [IsNotNull(key)], ReadSchema: struct<key:int,filter:string>
      +- ReusedExchange [key#94], GpuBroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#1134]

Signed-off-by: sperlingxx lovedreamf@gmail.com

Signed-off-by: sperlingxx <lovedreamf@gmail.com>
@sperlingxx
Copy link
Collaborator Author

build

@sameerz sameerz added the test Only impacts tests label May 31, 2022
@sameerz sameerz added this to the May 23 - Jun 3 milestone May 31, 2022
@sperlingxx
Copy link
Collaborator Author

build

1 similar comment
@sperlingxx
Copy link
Collaborator Author

build

Signed-off-by: sperlingxx <lovedreamf@gmail.com>
@sperlingxx
Copy link
Collaborator Author

build

@sperlingxx sperlingxx requested a review from abellina June 6, 2022 03:23
@sameerz sameerz modified the milestones: May 23 - Jun 3, Jun 6 - Jun 17 Jun 8, 2022
@sperlingxx
Copy link
Collaborator Author

Hi @andygrove, can you help to review this PR? Thanks!

@sperlingxx
Copy link
Collaborator Author

build

@GaryShen2008 GaryShen2008 merged commit eb5dbb2 into NVIDIA:branch-22.08 Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] audit [SPARK-34436][SQL] DPP support LIKE ANY/ALL expression
4 participants