Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix divide-by-zero in GpuAverage with ansi mode #2130

Merged
merged 4 commits into from
Apr 16, 2021

Conversation

abellina
Copy link
Collaborator

Fixes: #2078

…erage

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
@abellina abellina changed the title Irrespective of ansi enable, do not fail with divide-by-zero in GpuAv… Fix divide-by-zero in GpuAverage with ansi mode Apr 14, 2021
andygrove
andygrove previously approved these changes Apr 14, 2021
Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -330,7 +333,8 @@ object GpuDivideUtil {
}

// This is for doubles and floats...
case class GpuDivide(left: Expression, right: Expression) extends GpuDivModLike {
case class GpuDivide(left: Expression, right: Expression,
override val failOnErrorOverride: Option[Boolean] = None) extends GpuDivModLike {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good but I was curious why we are using a different pattern to Spark which just has a plain boolean argument with a default value, rather than using an option. Was this necessary because of the way we're using the shim layer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I tried other ways but couldn't think of something cleaner.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a 3-value logic of Option[Boolean].

I think we can do it almost like Spark.

let us undo the change to DivModLike just make failOnError non-lazy and define

case class GpuDivide(
  left: Expression, right: Expression,
  override val failOnError: Boolean = ShimLoader.getSparkShims.shouldFailDivByZero()
) extends GpuDivModLike {

jlowe
jlowe previously approved these changes Apr 14, 2021
revans2
revans2 previously approved these changes Apr 14, 2021
@abellina
Copy link
Collaborator Author

build

@sameerz sameerz added the bug Something isn't working label Apr 14, 2021
@sameerz sameerz added this to the Apr 12 - Apr 23 milestone Apr 14, 2021
override lazy val evaluateExpression: GpuExpression = GpuDivide(
GpuCast(cudfSum, DoubleType),
GpuCast(cudfCount, DoubleType))
GpuCast(cudfCount, DoubleType), Some(false))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: best practice Option(false) but I think we can get away with a simple Boolean

@@ -330,7 +333,8 @@ object GpuDivideUtil {
}

// This is for doubles and floats...
case class GpuDivide(left: Expression, right: Expression) extends GpuDivModLike {
case class GpuDivide(left: Expression, right: Expression,
override val failOnErrorOverride: Option[Boolean] = None) extends GpuDivModLike {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a 3-value logic of Option[Boolean].

I think we can do it almost like Spark.

let us undo the change to DivModLike just make failOnError non-lazy and define

case class GpuDivide(
  left: Expression, right: Expression,
  override val failOnError: Boolean = ShimLoader.getSparkShims.shouldFailDivByZero()
) extends GpuDivModLike {

@@ -269,7 +269,10 @@ object GpuDivModLike {
}

trait GpuDivModLike extends CudfBinaryArithmetic {
lazy val failOnError: Boolean = ShimLoader.getSparkShims.shouldFailDivByZero()
val failOnErrorOverride: Option[Boolean] = None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us try without failOnErrorOverride, just make failOnError non-lazy, so we can override it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can override a lazy val

override lazy val failOnError: Boolean = failOnErrorOverride.getOrElse(GpuDivModeLike.failOnError)

But I am fine with keeping this as is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I just meant you can't use lazy as a parameter.

@abellina
Copy link
Collaborator Author

@gerashegalov we don't have ANSI semantics in several of the operations. In the DivModLike ops alone, the failOnError flag is actually given (in Spark) to: Divide, IntegralDivide and Remainder. Each of these has as default SQLConf.get.ansiEnabled. In our case, Pmod falls under the DivModLike category also.

Doing a quick search in Spark, I am finding that:

  • Divide: failOnError is passed only in the Average as per this PR.
  • IntegralDivide: failOnError in case class signature but not overwritten.
  • Reminder: failOnError in case class signature but not overwritten.
  • Pmod: failOnError in case class signature but not overwritten.

So the only class that really needs it is GpuDivide, and it seems better (and less error prone) to me to have the call to the shim in a single place GpuDivModLike, with the override possibility, as only GpuDivide requires it.

That said, if we want to match Spark more, I can look into adding the failOnError to all the GpuDivModLikes. Interested to know if we need an issue to track ANSI work for the plugin in general, as it seems that many places have forks for this.

@abellina abellina dismissed stale reviews from revans2, jlowe, and andygrove via e0e89a8 April 15, 2021 15:57
@gerashegalov
Copy link
Collaborator

@abellina this is the change I am suggesting in the nut shell. https://github.com/abellina/spark-rapids/compare/agg/fix_ansi_avg...gerashegalov:agg/fix_ansi_avg?expand=1

@revans2 if we made failOnError non-lazy then we don't need an override inside the GpuDivide case class body but could do it just as a param

@abellina
Copy link
Collaborator Author

@gerashegalov updated PR to incorporate your suggestion.

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abellina
Copy link
Collaborator Author

build

@revans2 revans2 merged commit d589867 into NVIDIA:branch-0.5 Apr 16, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] java.lang.ArithmeticException: divide by zero when spark.sql.ansi.enabled=true
6 participants