-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 330 build error and add 322 shims layer [databricks] #4447
Conversation
Signed-off-by: Chong Gao <res_life@163.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Chong Gao <res_life@163.com>
Signed-off-by: Chong Gao <res_life@163.com>
Signed-off-by: Chong Gao <res_life@163.com>
Spark 3.2.1 snapshot stoped updating and have no release 3.2.1, currently it changed to 3.2.2 snapshot. |
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
build |
This PR is also about it. @kuhushukla Can you help to review? |
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
build |
The changes prima facie look fine. We could probably name the 322+ and 320+ appropriately and more precisely in order to not confuse what they really mean. At this point we have three such folders including 330+. |
Signed-off-by: Chong Gao <res_life@163.com>
It would be nice if you could combine the two PRS and add a coauthor. |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I have mostly nits and some ideas to minimizing the patch but it's ok to address it in a follow-on.
@@ -109,7 +108,8 @@ trait SparkShims { | |||
pushDownStartWith: Boolean, | |||
pushDownInFilterThreshold: Int, | |||
caseSensitive: Boolean, | |||
datetimeRebaseMode: LegacyBehaviorPolicy.Value): ParquetFilters | |||
lookupFileMeta: String => String, | |||
modeByConfig: String): ParquetFilters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep dateTimeRebase
prefix: dateTimeRebaseModeByConfig
nit: maybe dateTimeRebaseModeFromConf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes totally agree these names are too generic. That or we need to add scala docs to explain what they do.
|
||
override def otherCopyArgs: Seq[AnyRef] = cpuLeftKeys :: cpuRightKeys :: Nil | ||
|
||
override def requiredChildDistribution: Seq[Distribution] = | ||
HashClusteredDistribution(cpuLeftKeys) :: HashClusteredDistribution(cpuRightKeys) :: Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
override def requiredChildDistribution: Seq[Distribution] =
Seq(HashClusteredDistribution(cpuLeftKeys), HashClusteredDistribution(cpuRightKeys))
buildSide match { | ||
case BuildRight => GpuBuildRight | ||
case BuildLeft => GpuBuildLeft | ||
case _ => throw new Exception("unknown buildSide Type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
case unknownBuildSide => throw new IllegalArgumentException(s"unknown buildSide Type: $unknownBuildSide")
condition, | ||
isSkewJoin = isSkewJoin) { | ||
|
||
override def otherCopyArgs: Seq[AnyRef] = cpuLeftKeys :: cpuRightKeys :: Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cpuLeftKeys ++ cpuRightKeys
?
override def otherCopyArgs: Seq[AnyRef] = cpuLeftKeys :: cpuRightKeys :: Nil | ||
|
||
override def requiredChildDistribution: Seq[Distribution] = | ||
HashClusteredDistribution(cpuLeftKeys) :: HashClusteredDistribution(cpuRightKeys) :: Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Seq(HashClusteredDistribution(cpuLeftKeys), HashClusteredDistribution(cpuRightKeys))
?
SQLConf.PARQUET_INT96_REBASE_MODE_IN_READ.key | ||
override def int96ParquetRebaseWriteKey: String = | ||
SQLConf.PARQUET_INT96_REBASE_MODE_IN_WRITE.key | ||
override def hasSeparateINT96RebaseConf: Boolean = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many members of this trait can be factored out into another mix in trait that is common beyond 322
} | ||
} | ||
|
||
// override def createShuffleSpec(distribution: ClusteredDistribution): ShuffleSpec = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove code commented out
|
||
override def otherCopyArgs: Seq[AnyRef] = cpuLeftKeys :: cpuRightKeys :: Nil | ||
|
||
override def requiredChildDistribution: Seq[Distribution] = | ||
ClusteredDistribution(cpuLeftKeys) :: ClusteredDistribution(cpuRightKeys) :: Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Seq(ClusteredDistribution(cpuLeftKeys), ClusteredDistribution(cpuRightKeys))
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I just have nits. I would like to at least see a follow on issue to address the names of the new parameters for the parquet filter shim layer and something to look into the TODO comment. Even if it ends up being that we just delete the comment.
@@ -109,7 +108,8 @@ trait SparkShims { | |||
pushDownStartWith: Boolean, | |||
pushDownInFilterThreshold: Int, | |||
caseSensitive: Boolean, | |||
datetimeRebaseMode: LegacyBehaviorPolicy.Value): ParquetFilters | |||
lookupFileMeta: String => String, | |||
modeByConfig: String): ParquetFilters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes totally agree these names are too generic. That or we need to add scala docs to explain what they do.
} | ||
|
||
override def getDateFormatter(): DateFormatter = { | ||
// TODO verify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO done? I see the same thing in the original Spark32XShims.scala so I am fine if this is a follow on issue. We really don't want a TODO like this in our code.
ca270e7
build |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved for the last commit for some nit changes.
The previous commits have been approved. Let's merge it.
@res-life can you add the follow on issues here and assign them to yourself if you are working on them. We can also assign it to someone else if you would prefer. |
import org.apache.spark.unsafe.types.CalendarInterval | ||
|
||
/** | ||
* Shim base class that can be compiled with every supported 3.2.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment here is wrong since only 3.2.2+
/** | ||
* Shim base class that can be compiled with every supported 3.2.x | ||
*/ | ||
trait Spark322PlusShims extends SparkShims with RebaseShims with Logging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this class appears to be a lot of copied code for what is really a single line change:
< .datetimeRebaseMode(lookupFileMeta, dateTimeRebaseModeFromConf)
---
> .datetimeRebaseSpec(lookupFileMeta, dateTimeRebaseModeFromConf)
It feels like much of this could be common with Spark320until322Shims. For instance, can't we put all the common code back into Spark32XShims (in 320+) and then just the single function that is different put into Spark320Shims, Spark320Shims, Spark322Shims. Or alternatively just put that single function in the Spark322PlusShims and Spark320until322Shims which extends the Spark32xShims. Let me know if I'm missing something.
This fixes #4438, #4432 and #4443
Co-authored-by: Firestarman firestarmanllc@gmail.com
Co-authored-by: Kuhu Shukla kuhus@nvidia.com
Compile is broken for builver=330 after https://issues.apache.org/jira/browse/SPARK-37705, And it also impacted the build for Spark 3.2.2 snapshot. So fix the 2 issues togother in one PR.