Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't acquire the semaphore for empty input while scanning #4476

Merged
merged 3 commits into from
Jan 11, 2022

Conversation

abellina
Copy link
Collaborator

@abellina abellina commented Jan 7, 2022

Signed-off-by: Alessandro Bellina abellina@nvidia.com

This PR removes instances where we acquired the GpuSemaphore within the scan code, specifically those scans that are PartitionReaders. The change is to stop acquiring the semaphore here, because these cases will not be consumed downstream, given how the PartitionReader is used.

PartitionReader provides a next() method that returns false when there isn't work left. The PartitionReaderIterator wraps these readers, as far as I can tell, and it exposes the iterator interface (hasNext() returns false if the PartitionReader returned a false in next()). This allows us to stop requiring us to acquire the semaphore in the next() method, when there isn't any work left in the reader.

The PR also adds an acquire in the aggregate where we are producing rows out of nothing. This should be a noop, and likely just a miss that it wasn't there before (protected by upstream nodes doing the acquire for the agg).

The change has modest gains in NDS so far. I want to run a few more times, but so far I see: 10 queries that are 1.20x+ faster, 39 queries that are 1.10x+, and most queries (92) are above 1. I don't see a lot of regression here, except for q94 with 2 seconds, but q94 is falling back to the CPU and can be unpredictable. In other words, it seems to be mostly good, but I want to run a few more tests.

@revans2 wanted comments explaining why we are not acquiring the semaphore. I wasn't sure where to put them, and they are the same each time. I could see us moving them to the PartitionReaderIterator or somewhere else more appropriate.

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
@abellina
Copy link
Collaborator Author

abellina commented Jan 7, 2022

hmm looks like the build is failing on not being able to create the docker container.

@sameerz sameerz added the performance A performance related task/issue label Jan 8, 2022
@sameerz sameerz added this to the Jan 10 - Jan 28 milestone Jan 8, 2022
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me other than copyright nits.

jlowe
jlowe previously approved these changes Jan 10, 2022
@jlowe
Copy link
Member

jlowe commented Jan 10, 2022

build

@abellina abellina marked this pull request as ready for review January 10, 2022 17:45
@abellina
Copy link
Collaborator Author

I ran another iteration of NDS, with similar results. Most queries are slightly faster, with some more significant gains in those single digit queries like q2 q53 q55 where we are gaining up to 3 seconds.

Overall I see ~42 seconds over all of NDS at 2TB.

@abellina abellina changed the title Don't acquire the semaphore for empty batches while scanning Don't acquire the semaphore for empty input while scanning Jan 11, 2022
@jlowe
Copy link
Member

jlowe commented Jan 11, 2022

build

Copy link
Collaborator

@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this looks good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants