From c24f92b685f5edf1615bc6ff72cc203dfece9cb4 Mon Sep 17 00:00:00 2001
From: "Robert (Bobby) Evans" <bobby@apache.org>
Date: Mon, 23 Nov 2020 08:53:53 -0600
Subject: [PATCH] Updated documentation for distinct count compatibility

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
---
 docs/compatibility.md | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/docs/compatibility.md b/docs/compatibility.md
index 347dce36e16..0b1f19a45b6 100644
--- a/docs/compatibility.md
+++ b/docs/compatibility.md
@@ -48,23 +48,35 @@ floating point aggregations are off by default but can be enabled with the confi
 
 Additionally, some aggregations on floating point columns that contain `NaN` can produce
 incorrect results. More details on this behavior can be found
-[here](https://github.com/NVIDIA/spark-rapids/issues/87)
+[here](https://github.com/NVIDIA/spark-rapids/issues/87),
+[here](https://github.com/NVIDIA/spark-rapids/issues/837),
 and in this cudf [feature request](https://github.com/rapidsai/cudf/issues/4753).
 If it is known with certainty that the floating point columns do not contain `NaN`,
 set [`spark.rapids.sql.hasNans`](configs.md#sql.hasNans) to `false` to run GPU enabled
 aggregations on them.
 
+In the case of a distinct count on `NaN` values the 
+[issue](https://github.com/NVIDIA/spark-rapids/issues/837) only shows up if you have different
+`NaN` values. There are several different binary values that are all considered to be `NaN` by
+floating point. The plugin treats all of these as the same value, where as Spark treats them
+all as different values. Because this is considered to be rare we do not disable distinct count
+for floating point values even if [`spark.rapids.sql.hasNans`](configs.md#sql.hasNans) is `true`.
+
 ### `0.0` vs `-0.0`
 
-Floating point allows zero to be encoded as `0.0` and `-0.0`, but the standard says that
+Floating point allows zero to be encoded as `0.0` and `-0.0`, but the IEEE standard says that
 they should be interpreted as the same. Most databases normalize these values to always
 be `0.0`. Spark does this in some cases but not all as is documented
 [here](https://issues.apache.org/jira/browse/SPARK-32110). The underlying implementation of
 this plugin treats them as the same for essentially all processing. This can result in some
 differences with Spark for operations like
 [sorting](https://github.com/NVIDIA/spark-rapids/issues/84),
+[distinct count](https://github.com/NVIDIA/spark-rapids/issues/837),
 [joins, and comparisons](https://github.com/NVIDIA/spark-rapids/issues/294).
 
+We do not disable operations that produce different results due to `-0.0` in the data because
+it is considered to be a rare occurrence.
+
 ## Unicode
 
 Spark delegates Unicode operations to the underlying JVM. Each version of Java complies with a
@@ -407,4 +419,4 @@ When translating UDFs to Catalyst expressions, the supported UDF functions are l
 |                          | Array.empty[Float]                                       |
 |                          | Array.empty[Double]                                      |
 |                          | Array.empty[String]                                      |
-| Method call              | Only if the method being called <ol><li>consists of operations supported by the UDF compiler, and</li><li>is one of the folllowing:<ul><li>a final method, or</li><li>a method in a final class, or</li><li>a method in a final object</li></ul></li></ol> |
+| Method call              | Only if the method being called <ol><li>consists of operations supported by the UDF compiler, and</li><li>is one of the following:<ul><li>a final method, or</li><li>a method in a final class, or</li><li>a method in a final object</li></ul></li></ol> |