Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary intermediate classes in the Shuffle Manager class hierarchy #9456

Closed
gerashegalov opened this issue Oct 17, 2023 · 0 comments · Fixed by #9444
Closed

Unnecessary intermediate classes in the Shuffle Manager class hierarchy #9456

gerashegalov opened this issue Oct 17, 2023 · 0 comments · Fixed by #9444
Assignees

Comments

@gerashegalov
Copy link
Collaborator

Spark RAPIDS plugin provides a shuffle manager implementation to improve concurrency on CPU, and to enable a GPU backed shuffle using UCX.

ShuffleManager traits in Spark are generally not compatible across versions and it used to be impossible to satisfy different versions of this API using the same source over a version range 3.0.x, now. Eventually we were able to remove the shims and initiate the cleanup such as #6030

At this point we ended up with a few more intermediate classes that override nothing in the class hierarchy.

They are maintained per shim and contribute to maintenance burden and confusion.

spark311.RapidsShuffleManager (just to add spark311 to the package name) :> spark311.ProxyRapidsShuffleIntenalManager (no code) :> shared ProxyRapidsShuffleIntenalManagerBase
spark311.RapidsShuffleInternalManager (instantiated via proxy, no code) :> spark311.(no code) :> shared RapidsShuffleInternalManagerBase

Thus spark311.ProxyRapidsShuffleIntenalManager can be dropped by having spark311.RapidsShuffleManager inherit ProxyRapidsShuffleIntenalManagerBase directly

spark311.RapidsShuffleInternalManager can be dropped by having the shared ProxyRapidsShuffleIntenalManagerBase instantiate the shared RapidsShuffleInternalManagerBase

@gerashegalov gerashegalov self-assigned this Oct 17, 2023
gerashegalov added a commit that referenced this issue Oct 24, 2023
…9444)

Closes #9456
Contributes to #6565 

- additionally remove unsued pom properties
- remove unnecessary unshimmed*txt entries missed in #9506 
- get shuffle manager definition to the point of an easy template, probably can be a shimplify feature

```scala
/*** spark-rapids-shim-json-lines
{"spark": "${buldver}"}
spark-rapids-shim-json-lines ***/
package com.nvidia.spark.rapids.spark${buldver}

import org.apache.spark.SparkConf
import org.apache.spark.shuffle.rapids.ProxyRapidsShuffleInternalManagerBase

/** A shuffle manager optimized for the RAPIDS Plugin for Apache Spark. */
sealed class RapidsShuffleManager(
    conf: SparkConf,
    isDriver: Boolean
) extends ProxyRapidsShuffleInternalManagerBase(conf, isDriver)
```
 
Signed-off-by: Gera Shegalov <gera@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant