Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 330 build error and add 322 shims layer [databricks] #4447

Merged
merged 9 commits into from
Jan 4, 2022

Conversation

res-life
Copy link
Collaborator

@res-life res-life commented Dec 31, 2021

This fixes #4438, #4432 and #4443

Co-authored-by: Firestarman firestarmanllc@gmail.com
Co-authored-by: Kuhu Shukla kuhus@nvidia.com

Compile is broken for builver=330 after https://issues.apache.org/jira/browse/SPARK-37705, And it also impacted the build for Spark 3.2.2 snapshot. So fix the 2 issues togother in one PR.

Chong Gao and others added 5 commits December 30, 2021 11:53
Signed-off-by: Chong Gao <res_life@163.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Chong Gao <res_life@163.com>
Signed-off-by: Chong Gao <res_life@163.com>
Signed-off-by: Chong Gao <res_life@163.com>
@res-life
Copy link
Collaborator Author

res-life commented Dec 31, 2021

Spark 3.2.1 snapshot stoped updating and have no release 3.2.1, currently it changed to 3.2.2 snapshot.
Let's keep 3.2.1 snapshot shims and create 3.2.2 snapshot shims.
Later if confirm 3.2.1 snapshot is useless, then we can remove 3.2.1 snapshot.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@res-life res-life changed the title Add 322 shims layer Fix 330 build error and add 322 shims layer Dec 31, 2021
@firestarman firestarman changed the title Fix 330 build error and add 322 shims layer Fix 330 build error and add 322 shims layer [databricks] Dec 31, 2021
@firestarman
Copy link
Collaborator

build

@res-life
Copy link
Collaborator Author

This PR is also about it. @kuhushukla Can you help to review?

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator

build

@kuhushukla
Copy link
Collaborator

The changes prima facie look fine. We could probably name the 322+ and 320+ appropriately and more precisely in order to not confuse what they really mean. At this point we have three such folders including 330+.

Signed-off-by: Chong Gao <res_life@163.com>
@kuhushukla
Copy link
Collaborator

This PR is also about it. @kuhushukla Can you help to review?

It would be nice if you could combine the two PRS and add a coauthor.

@res-life
Copy link
Collaborator Author

build

gerashegalov
gerashegalov previously approved these changes Dec 31, 2021
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I have mostly nits and some ideas to minimizing the patch but it's ok to address it in a follow-on.

@@ -109,7 +108,8 @@ trait SparkShims {
pushDownStartWith: Boolean,
pushDownInFilterThreshold: Int,
caseSensitive: Boolean,
datetimeRebaseMode: LegacyBehaviorPolicy.Value): ParquetFilters
lookupFileMeta: String => String,
modeByConfig: String): ParquetFilters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep dateTimeRebase prefix: dateTimeRebaseModeByConfig

nit: maybe dateTimeRebaseModeFromConf ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes totally agree these names are too generic. That or we need to add scala docs to explain what they do.


override def otherCopyArgs: Seq[AnyRef] = cpuLeftKeys :: cpuRightKeys :: Nil

override def requiredChildDistribution: Seq[Distribution] =
HashClusteredDistribution(cpuLeftKeys) :: HashClusteredDistribution(cpuRightKeys) :: Nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

override def requiredChildDistribution: Seq[Distribution] =
    Seq(HashClusteredDistribution(cpuLeftKeys), HashClusteredDistribution(cpuRightKeys))

buildSide match {
case BuildRight => GpuBuildRight
case BuildLeft => GpuBuildLeft
case _ => throw new Exception("unknown buildSide Type")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

case unknownBuildSide => throw new IllegalArgumentException(s"unknown buildSide Type: $unknownBuildSide")

condition,
isSkewJoin = isSkewJoin) {

override def otherCopyArgs: Seq[AnyRef] = cpuLeftKeys :: cpuRightKeys :: Nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: cpuLeftKeys ++ cpuRightKeys ?

override def otherCopyArgs: Seq[AnyRef] = cpuLeftKeys :: cpuRightKeys :: Nil

override def requiredChildDistribution: Seq[Distribution] =
HashClusteredDistribution(cpuLeftKeys) :: HashClusteredDistribution(cpuRightKeys) :: Nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Seq(HashClusteredDistribution(cpuLeftKeys), HashClusteredDistribution(cpuRightKeys)) ?

SQLConf.PARQUET_INT96_REBASE_MODE_IN_READ.key
override def int96ParquetRebaseWriteKey: String =
SQLConf.PARQUET_INT96_REBASE_MODE_IN_WRITE.key
override def hasSeparateINT96RebaseConf: Boolean = true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many members of this trait can be factored out into another mix in trait that is common beyond 322

}
}

// override def createShuffleSpec(distribution: ClusteredDistribution): ShuffleSpec =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove code commented out


override def otherCopyArgs: Seq[AnyRef] = cpuLeftKeys :: cpuRightKeys :: Nil

override def requiredChildDistribution: Seq[Distribution] =
ClusteredDistribution(cpuLeftKeys) :: ClusteredDistribution(cpuRightKeys) :: Nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Seq(ClusteredDistribution(cpuLeftKeys), ClusteredDistribution(cpuRightKeys))

kuhushukla
kuhushukla previously approved these changes Dec 31, 2021
@sameerz sameerz added the build Related to CI / CD or cleanly building label Jan 1, 2022
@sameerz sameerz added this to the Dec 13 - Jan 7 milestone Jan 1, 2022
@sameerz
Copy link
Collaborator

sameerz commented Jan 1, 2022

build

revans2
revans2 previously approved these changes Jan 3, 2022
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I just have nits. I would like to at least see a follow on issue to address the names of the new parameters for the parquet filter shim layer and something to look into the TODO comment. Even if it ends up being that we just delete the comment.

@@ -109,7 +108,8 @@ trait SparkShims {
pushDownStartWith: Boolean,
pushDownInFilterThreshold: Int,
caseSensitive: Boolean,
datetimeRebaseMode: LegacyBehaviorPolicy.Value): ParquetFilters
lookupFileMeta: String => String,
modeByConfig: String): ParquetFilters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes totally agree these names are too generic. That or we need to add scala docs to explain what they do.

}

override def getDateFormatter(): DateFormatter = {
// TODO verify
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO done? I see the same thing in the original Spark32XShims.scala so I am fine if this is a follow on issue. We really don't want a TODO like this in our code.

Signed-off-by: Chong Gao <res_life@163.com>
@res-life
Copy link
Collaborator Author

res-life commented Jan 4, 2022

build

@firestarman
Copy link
Collaborator

LGTM

Copy link
Collaborator

@GaryShen2008 GaryShen2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved for the last commit for some nit changes.
The previous commits have been approved. Let's merge it.

@GaryShen2008 GaryShen2008 merged commit 9281caa into NVIDIA:branch-22.02 Jan 4, 2022
@kuhushukla
Copy link
Collaborator

@res-life can you add the follow on issues here and assign them to yourself if you are working on them. We can also assign it to someone else if you would prefer.

import org.apache.spark.unsafe.types.CalendarInterval

/**
* Shim base class that can be compiled with every supported 3.2.x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment here is wrong since only 3.2.2+

/**
* Shim base class that can be compiled with every supported 3.2.x
*/
trait Spark322PlusShims extends SparkShims with RebaseShims with Logging {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class appears to be a lot of copied code for what is really a single line change:

<       .datetimeRebaseMode(lookupFileMeta, dateTimeRebaseModeFromConf)
---
>       .datetimeRebaseSpec(lookupFileMeta, dateTimeRebaseModeFromConf)

It feels like much of this could be common with Spark320until322Shims. For instance, can't we put all the common code back into Spark32XShims (in 320+) and then just the single function that is different put into Spark320Shims, Spark320Shims, Spark322Shims. Or alternatively just put that single function in the Spark322PlusShims and Spark320until322Shims which extends the Spark32xShims. Let me know if I'm missing something.

@res-life res-life mentioned this pull request Jan 6, 2022
@res-life res-life deleted the add-322-shims branch April 16, 2022 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
8 participants