Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator to make it easier to work with a window of blocks in the RAPIDS shuffle #934

Merged

Conversation

abellina
Copy link
Collaborator

@abellina abellina commented Oct 13, 2020

Signed-off-by: Alessandro Bellina abellina@nvidia.com

This introduces an iterator that will be used in subsequent commits for the rapids shuffle. Adding it separate from them to reduce the size of PRs. If reviewers want me to put up a bigger PR with the rest of this I can, but I am trying to keep it reviewable.

The iterator, given a set of blocks will return on next() ranges of blocks that pack a window (of bytes)

  1. If you have blocks smaller than a window, this puts them back to back within such window (spark's overpartition case)
  2. If you have blocks larger than a window, this works to split ranges given the window length (multiple calls to next).

The main advantage of the approach supported by this iterator is that we utilize bounce buffers more efficiently, and also reduce the amount of calls to the underlying protocol.

@abellina abellina added performance A performance related task/issue shuffle things that impact the shuffle plugin labels Oct 13, 2020
@abellina abellina changed the title Adds iterator to make it easier to work with ranges of blocks Adds iterator to make it easier to work with a window of blocks in the rapids shuffle Oct 13, 2020
@revans2
Copy link
Collaborator

revans2 commented Oct 13, 2020

The code looks fine, I just don't like to add in code if it is not actively being used.

@abellina
Copy link
Collaborator Author

@revans2 that's ok, I am fine closing this and trying to fit it in with another PR that targets mostly the receive/send states in the rapids shuffle. It's so far a pretty big diff 1.5K lines, so I can try reducing it.

@revans2
Copy link
Collaborator

revans2 commented Oct 13, 2020

I am fine with putting this in if I know that there will be something coming shortly. So if you could put up a follow on PR that depends on this, then I would feel better about it, because then it is clear that you are breaking it up and that there will not be dead code for long.

@abellina abellina changed the title Adds iterator to make it easier to work with a window of blocks in the rapids shuffle [DRAFT] Adds iterator to make it easier to work with a window of blocks in the rapids shuffle Oct 13, 2020
@abellina abellina changed the title [DRAFT] Adds iterator to make it easier to work with a window of blocks in the rapids shuffle [WIP] Adds iterator to make it easier to work with a window of blocks in the rapids shuffle Oct 13, 2020
@abellina
Copy link
Collaborator Author

@revans2 marked it as WIP, as I put up the rest. It is going to take me a couple of days to get that second PR up likely.

@abellina abellina force-pushed the shuffle/bounce_buffer_window_iterator branch from b247bec to dfff723 Compare October 19, 2020 19:28
@abellina abellina changed the title [WIP] Adds iterator to make it easier to work with a window of blocks in the rapids shuffle Adds iterator to make it easier to work with a window of blocks in the rapids shuffle Oct 19, 2020
@abellina
Copy link
Collaborator Author

@jlowe @revans2 this is one is ready to review. I have my bigger PR ready to go as well on top of this, so we'll have that one up shortly after/if this one gets merged.

@abellina
Copy link
Collaborator Author

@revans2 your comments should (hopefully) be addressed in: 8bd887a. Just let me know.

@jlowe jlowe changed the title Adds iterator to make it easier to work with a window of blocks in the rapids shuffle Iterator to make it easier to work with a window of blocks in the RAPIDS shuffle Oct 19, 2020
jlowe
jlowe previously approved these changes Oct 20, 2020
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just a minor nit.

@abellina
Copy link
Collaborator Author

build

jlowe
jlowe previously approved these changes Oct 20, 2020
@abellina
Copy link
Collaborator Author

build

jlowe
jlowe previously approved these changes Oct 20, 2020
revans2
revans2 previously approved these changes Oct 20, 2020
@abellina
Copy link
Collaborator Author

abellina commented Oct 20, 2020

I believe blossom-ci is stuck, something about a volume having issues. I can try and kill the job and start CI over.

@abellina
Copy link
Collaborator Author

build

1 similar comment
@abellina
Copy link
Collaborator Author

build

@abellina abellina dismissed stale reviews from revans2 and jlowe via 8ba3160 October 20, 2020 22:47
@abellina abellina force-pushed the shuffle/bounce_buffer_window_iterator branch from 3422a3f to 8ba3160 Compare October 20, 2020 22:47
@abellina
Copy link
Collaborator Author

Rebased to pick up CI fix for 3.1.0

@abellina
Copy link
Collaborator Author

build

@abellina abellina merged commit 65b739b into NVIDIA:branch-0.3 Oct 21, 2020
@abellina abellina deleted the shuffle/bounce_buffer_window_iterator branch October 21, 2020 02:27
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this pull request Nov 20, 2020
…IDS shuffle (NVIDIA#934)

* Adds iterator to make it easier to work with ranges of blocks

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…IDS shuffle (NVIDIA#934)

* Adds iterator to make it easier to work with ranges of blocks

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…IDS shuffle (NVIDIA#934)

* Adds iterator to make it easier to work with ranges of blocks

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…IDIA#934)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue shuffle things that impact the shuffle plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants