Fix a few minor things with scale test #9200

revans2 · 2023-09-07T17:32:53Z

This fixes #9198

It also

increases the types of columns that can be used for min/max operations
Adds a user readable column name for all output columns (so comparing results will be simpler)
Adjusts the ratio of SUM vs MIN/MAX columns to be even instead of random.
Removes duplication in the query names. The query names before were put into the map and also into the TestQuery. This moves them to only ever be in the TestQuery and the map is built from that.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 · 2023-09-07T17:33:04Z

build

razajafri

Not familiar with this code. I have made one pass, will go over it one more time in a bit

razajafri · 2023-09-07T17:59:21Z

integration_tests/src/main/scala/com/nvidia/spark/rapids/tests/scaletest/QuerySpecs.scala

@@ -73,6 +72,32 @@ class QuerySpecs(config: Config, spark: SparkSession) {
    numericColumns
  }

+  /**
+   * To get columns in a dataframe that work with min/max.
+   * Now numeric columns are limited to [byte, int, long, decimal, string]


docs missing TimestampType and DateType

gerashegalov · 2023-09-08T04:06:55Z

integration_tests/src/main/scala/com/nvidia/spark/rapids/tests/scaletest/QuerySpecs.scala

+    df.dtypes.filter {
+      case (_, dataType) =>
+        dataType == "ByteType" || dataType == "IntegerType" || dataType == "LongType" ||
+            dataType.startsWith("DecimalType") || dataType == "StringType" ||
+            dataType == "TimestampType" || dataType == "DateType"
+    }.map {
+      case (columnName, _) => columnName
+    }


nit: consider pattern matching for readability

val decimalTypePrefixRegex = "^DecimalType.*".r df.dtypes.collect { case (columnName, "ByteType") => columnName case (columnName, "IntegerType") => columnName ... case (columnName, decimalTypePrefixRegex) => columnName }

revans2 · 2023-09-08T13:41:08Z

build

gerashegalov

LGTM

gerashegalov · 2023-09-08T19:17:45Z

integration_tests/src/main/scala/com/nvidia/spark/rapids/tests/scaletest/QuerySpecs.scala

+    df.schema.map { field =>
+      (field.name, field.dataType)
+    }.collect {
+      case (columnName, ByteType | IntegerType | LongType | _: DecimalType) =>
+        columnName


no need to fix, just a remark that even with this improvement map is not really necessary (unless it causes issues with different Spark versions) if we collect with

case StructField(columnName, ByteType | IntegerType | LongType | _: DecimalType, _, _) => columnName

revans2 · 2023-09-08T20:59:26Z

@wjxiz1992 could you take a look too.

wjxiz1992

This LGTM.

Fix a few minor things with scale test

410ea04

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 requested a review from wjxiz1992 September 7, 2023 17:32

razajafri reviewed Sep 7, 2023

View reviewed changes

gerashegalov reviewed Sep 8, 2023

View reviewed changes

Review comments

1605d8e

gerashegalov approved these changes Sep 8, 2023

View reviewed changes

sameerz added the scale test label Sep 9, 2023

wjxiz1992 approved these changes Sep 11, 2023

View reviewed changes

revans2 merged commit 1ccdf89 into NVIDIA:branch-23.10 Sep 12, 2023
28 checks passed

revans2 deleted the add_query_number branch September 12, 2023 13:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a few minor things with scale test #9200

Fix a few minor things with scale test #9200

revans2 commented Sep 7, 2023

revans2 commented Sep 7, 2023

razajafri left a comment

razajafri Sep 7, 2023

gerashegalov Sep 8, 2023

revans2 commented Sep 8, 2023

gerashegalov left a comment

gerashegalov Sep 8, 2023

revans2 commented Sep 8, 2023

wjxiz1992 left a comment

Fix a few minor things with scale test #9200

Fix a few minor things with scale test #9200

Conversation

revans2 commented Sep 7, 2023

revans2 commented Sep 7, 2023

razajafri left a comment

Choose a reason for hiding this comment

razajafri Sep 7, 2023

Choose a reason for hiding this comment

gerashegalov Sep 8, 2023

Choose a reason for hiding this comment

revans2 commented Sep 8, 2023

gerashegalov left a comment

Choose a reason for hiding this comment

gerashegalov Sep 8, 2023

Choose a reason for hiding this comment

revans2 commented Sep 8, 2023

wjxiz1992 left a comment

Choose a reason for hiding this comment