Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid readers acquiring GPU on next batch query if not first batch #2870

Merged
merged 3 commits into from
Jul 7, 2021

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Jul 6, 2021

The readers all had logic that would unconditionally try to acquire the GPU semaphore in their hasNext() or equivalent predicate method to cover the case where a downstream exec would generate GPU data even when there were no batches from the reader (e.g.: reduction-only aggregations will produce a single row even with no input). This unconditional acquisition was great for covering that case, but it also can lead to poor performance where the following sequence can occur:

  • First and only batch flows through task iterators and is eventually partitioned by shuffle logic
  • Shuffle logic releases GPU semaphore
  • Iterator chain is queried to see if there's another batch
  • Reader iterator at end of chain always grabs the semaphore even when there's no more batches
  • Task proceeds to wait for the semphore then exits without further work, preventing the current CPU core from working on the next task

This behavior can be seen in the following Nsight profile screenshot, where the second GPU acquisition is unnecessary:
image

This PR changes the logic so the GPU semaphore is only acquired by the reader iterator logic if this is querying for the very first batch. Once any other batch has been produced by the iterator, the empty input case no longer applies and the semaphore does not need to be acquired to cover it.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe added the performance A performance related task/issue label Jul 6, 2021
@jlowe jlowe self-assigned this Jul 6, 2021
@jlowe
Copy link
Member Author

jlowe commented Jul 6, 2021

build

revans2
revans2 previously approved these changes Jul 7, 2021
abellina
abellina previously approved these changes Jul 7, 2021
@abellina
Copy link
Collaborator

abellina commented Jul 7, 2021

@jlowe looks like some conflicts with GpuOrcScan

@jlowe jlowe dismissed stale reviews from abellina and revans2 via 8ca06a6 July 7, 2021 16:42
@jlowe
Copy link
Member Author

jlowe commented Jul 7, 2021

build

@jlowe jlowe merged commit 6342a8e into NVIDIA:branch-21.08 Jul 7, 2021
@jlowe jlowe deleted the reader-semaphore-perf-fix branch September 10, 2021 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants