Fix divide-by-zero in GpuAverage with ansi mode #2130

abellina · 2021-04-14T15:46:51Z

Fixes: #2078

…erage Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

andygrove

LGTM

andygrove · 2021-04-14T16:01:28Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/arithmetic.scala

@@ -330,7 +333,8 @@ object GpuDivideUtil {
 }

 // This is for doubles and floats...
-case class GpuDivide(left: Expression, right: Expression) extends GpuDivModLike {
+case class GpuDivide(left: Expression, right: Expression,
+    override val failOnErrorOverride: Option[Boolean] = None) extends GpuDivModLike {


The change looks good but I was curious why we are using a different pattern to Spark which just has a plain boolean argument with a default value, rather than using an option. Was this necessary because of the way we're using the shim layer?

Yes, I tried other ways but couldn't think of something cleaner.

I don't think we need a 3-value logic of Option[Boolean].

I think we can do it almost like Spark.

let us undo the change to DivModLike just make failOnError non-lazy and define

case class GpuDivide( left: Expression, right: Expression, override val failOnError: Boolean = ShimLoader.getSparkShims.shouldFailDivByZero() ) extends GpuDivModLike {

abellina · 2021-04-14T16:13:24Z

build

gerashegalov · 2021-04-14T23:55:19Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala

  override lazy val evaluateExpression: GpuExpression = GpuDivide(
    GpuCast(cudfSum, DoubleType),
-    GpuCast(cudfCount, DoubleType))
+    GpuCast(cudfCount, DoubleType), Some(false))


nit: best practice Option(false) but I think we can get away with a simple Boolean

gerashegalov · 2021-04-15T01:40:01Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/arithmetic.scala

@@ -330,7 +333,8 @@ object GpuDivideUtil {
 }

 // This is for doubles and floats...
-case class GpuDivide(left: Expression, right: Expression) extends GpuDivModLike {
+case class GpuDivide(left: Expression, right: Expression,
+    override val failOnErrorOverride: Option[Boolean] = None) extends GpuDivModLike {


I don't think we need a 3-value logic of Option[Boolean].

I think we can do it almost like Spark.

let us undo the change to DivModLike just make failOnError non-lazy and define

case class GpuDivide( left: Expression, right: Expression, override val failOnError: Boolean = ShimLoader.getSparkShims.shouldFailDivByZero() ) extends GpuDivModLike {

gerashegalov · 2021-04-15T03:13:10Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/arithmetic.scala

@@ -269,7 +269,10 @@ object GpuDivModLike {
 }

 trait GpuDivModLike extends CudfBinaryArithmetic {
-  lazy val failOnError: Boolean = ShimLoader.getSparkShims.shouldFailDivByZero()
+  val failOnErrorOverride: Option[Boolean] = None
+


let us try without failOnErrorOverride, just make failOnError non-lazy, so we can override it

You can override a lazy val

override lazy val failOnError: Boolean = failOnErrorOverride.getOrElse(GpuDivModeLike.failOnError)

But I am fine with keeping this as is.

yes, I just meant you can't use lazy as a parameter.

abellina · 2021-04-15T15:41:57Z

@gerashegalov we don't have ANSI semantics in several of the operations. In the DivModLike ops alone, the failOnError flag is actually given (in Spark) to: Divide, IntegralDivide and Remainder. Each of these has as default SQLConf.get.ansiEnabled. In our case, Pmod falls under the DivModLike category also.

Doing a quick search in Spark, I am finding that:

Divide: failOnError is passed only in the Average as per this PR.
IntegralDivide: failOnError in case class signature but not overwritten.
Reminder: failOnError in case class signature but not overwritten.
Pmod: failOnError in case class signature but not overwritten.

So the only class that really needs it is GpuDivide, and it seems better (and less error prone) to me to have the call to the shim in a single place GpuDivModLike, with the override possibility, as only GpuDivide requires it.

That said, if we want to match Spark more, I can look into adding the failOnError to all the GpuDivModLikes. Interested to know if we need an issue to track ANSI work for the plugin in general, as it seems that many places have forks for this.

gerashegalov · 2021-04-15T18:09:39Z

@abellina this is the change I am suggesting in the nut shell. https://github.com/abellina/spark-rapids/compare/agg/fix_ansi_avg...gerashegalov:agg/fix_ansi_avg?expand=1

@revans2 if we made failOnError non-lazy then we don't need an override inside the GpuDivide case class body but could do it just as a param

abellina · 2021-04-15T19:13:12Z

@gerashegalov updated PR to incorporate your suggestion.

gerashegalov

LGTM

abellina · 2021-04-16T04:50:18Z

build

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

Irrespective of ansi enable, do not fail with divide-by-zero in GpuAv…

b1ef5d4

…erage Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

abellina changed the title ~~Irrespective of ansi enable, do not fail with divide-by-zero in GpuAv…~~ Fix divide-by-zero in GpuAverage with ansi mode Apr 14, 2021

abellina requested review from andygrove and revans2 April 14, 2021 15:47

andygrove previously approved these changes Apr 14, 2021

View reviewed changes

andygrove reviewed Apr 14, 2021

View reviewed changes

jlowe previously approved these changes Apr 14, 2021

View reviewed changes

revans2 previously approved these changes Apr 14, 2021

View reviewed changes

sameerz added the bug Something isn't working label Apr 14, 2021

sameerz added this to the Apr 12 - Apr 23 milestone Apr 14, 2021

gerashegalov requested changes Apr 15, 2021

View reviewed changes

Some -> Option

e0e89a8

abellina dismissed stale reviews from revans2, jlowe, and andygrove via e0e89a8 April 15, 2021 15:57

abellina added 2 commits April 15, 2021 14:09

Stop using Option[Boolean]

b1cbf70

Undo whitespace changes

0c607b9

gerashegalov approved these changes Apr 15, 2021

View reviewed changes

revans2 approved these changes Apr 16, 2021

View reviewed changes

revans2 merged commit d589867 into NVIDIA:branch-0.5 Apr 16, 2021

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Fix divide-by-zero in GpuAverage with ansi mode (NVIDIA#2130)

06784a4

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Fix divide-by-zero in GpuAverage with ansi mode (NVIDIA#2130)

e574c45

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix divide-by-zero in GpuAverage with ansi mode #2130

Fix divide-by-zero in GpuAverage with ansi mode #2130

abellina commented Apr 14, 2021

andygrove left a comment

andygrove Apr 14, 2021

abellina Apr 14, 2021

gerashegalov Apr 15, 2021

abellina commented Apr 14, 2021

gerashegalov Apr 14, 2021

gerashegalov Apr 15, 2021

gerashegalov Apr 15, 2021

revans2 Apr 15, 2021

gerashegalov Apr 15, 2021

abellina commented Apr 15, 2021

gerashegalov commented Apr 15, 2021

abellina commented Apr 15, 2021

gerashegalov left a comment

abellina commented Apr 16, 2021

Fix divide-by-zero in GpuAverage with ansi mode #2130

Fix divide-by-zero in GpuAverage with ansi mode #2130

Conversation

abellina commented Apr 14, 2021

andygrove left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abellina commented Apr 14, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abellina commented Apr 15, 2021

gerashegalov commented Apr 15, 2021

abellina commented Apr 15, 2021

gerashegalov left a comment

Choose a reason for hiding this comment

abellina commented Apr 16, 2021