Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleans up some of the redundant code in proxy/internal RAPIDS Shuffle Managers [databricks] #6030

Merged

Conversation

abellina
Copy link
Collaborator

@abellina abellina commented Jul 19, 2022

Signed-off-by: Alessandro Bellina abellina@nvidia.com

Closes #3624.

I am looking to cleanup some of the proxy/base/internal classes in RapidsShuffleManager as per the issue @gerashegalov raised a while back. As it is now, each shim has 3 things:

  • RapidsShuffleManager (what the user sets, which subclasses ProxyRapidsShuffleInternalManager)
  • ProxyRapidsShuffleInternalManager (which subclasses from ProxyRapidsShuffleInternalManagerBase)
  • RapidsShuffleInternalManager (which subclasses from RapidsShuffleInternalManagerBase)

And essentially the RapidsShuffleManager (aka the proxy) loads the internal manager later in time. We have different versions of this so we get one specific to each shim, for binary compatibility reasons.

This change mainly moves getReader back to the RapidsShuffleInternalManagerBase so we can remove some redundant code (this path was the same everywhere I saw). It also renames VisibleShuffleManager to RapidsShuffleManagerLike, but this is not something I feel too strongly about. @gerashegalov could you take a look to see if you have other recommendations that can be done here?

There's more that can be done, for example ShuffleManager is consistent for all the spark versions we support (well not sure about the vendor specific ones as much), so we could in theory have a single RapidsShuffleManager that really just subclasses from ProxyRapidsShuffleInternalManagerBase directly and also implements ShuffleManager... but that seems like a bigger change.

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gerashegalov gerashegalov added shuffle things that impact the shuffle plugin task Work required that improves the product but is not user facing labels Jul 20, 2022
@gerashegalov gerashegalov marked this pull request as ready for review July 20, 2022 02:57
@gerashegalov gerashegalov changed the title Cleans up some of the redundant code in proxy/internal RAPIDS Shuffle Managers Cleans up some of the redundant code in proxy/internal RAPIDS Shuffle Managers [databricks] Jul 20, 2022
@gerashegalov
Copy link
Collaborator

build

@jlowe jlowe added this to the Jul 11 - Jul 22 milestone Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
shuffle things that impact the shuffle plugin task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Refactor RapidsShuffle managers classes
3 participants