Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with parquet partitioned reads #741

Merged
merged 1 commit into from
Sep 11, 2020

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Sep 11, 2020

This fixes #718

It also fixes a few other things I found while in the code.

It acquires the GPU Semaphore in cases where we are returning a batch, even if it is just row counts.

It updates the tests to use the new small file config. I noticed it when I checked if my new test would fail without the fix in place, and it failed all 4 cases instead of just half of them that I thought it should.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2 revans2 added the bug Something isn't working label Sep 11, 2020
@revans2 revans2 added this to the Aug 31 - Sep 11 milestone Sep 11, 2020
@revans2 revans2 self-assigned this Sep 11, 2020
@revans2
Copy link
Collaborator Author

revans2 commented Sep 11, 2020

build

.createPartitionValues(partitionValues, partitionSchema)
withResource(partitionScalars) { scalars =>
ColumnarPartitionReaderWithPartitionValues.addPartitionValues(cb, scalars)
if (partitionSchema.nonEmpty) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this change is not needed I originally started it to only grab the semaphore when there were partition values, but then I thought better and decided to grab it in all cases. I left this because I thought it made the code more readable, even though it works without the change.

If others disagree I am happy to remove it.

@revans2
Copy link
Collaborator Author

revans2 commented Sep 11, 2020

python udf window tests are failing here too now. Not sure how they passed for the original PR. Because this is a P1 I am going to merge this in as is, and then we can decide how to fix the test separately.

@revans2 revans2 merged commit da4c122 into NVIDIA:branch-0.2 Sep 11, 2020
@revans2 revans2 deleted the special_case_part_read branch September 11, 2020 16:43
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
[auto-merge] bot-auto-merge-branch-22.12 to branch-23.02 [skip ci] [bot]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] GpuBroadcastHashJoinExec ArrayIndexOutOfBoundsException
3 participants