-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a few minor things with scale test #9200
Conversation
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar with this code. I have made one pass, will go over it one more time in a bit
@@ -73,6 +72,32 @@ class QuerySpecs(config: Config, spark: SparkSession) { | |||
numericColumns | |||
} | |||
|
|||
/** | |||
* To get columns in a dataframe that work with min/max. | |||
* Now numeric columns are limited to [byte, int, long, decimal, string] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs missing TimestampType and DateType
df.dtypes.filter { | ||
case (_, dataType) => | ||
dataType == "ByteType" || dataType == "IntegerType" || dataType == "LongType" || | ||
dataType.startsWith("DecimalType") || dataType == "StringType" || | ||
dataType == "TimestampType" || dataType == "DateType" | ||
}.map { | ||
case (columnName, _) => columnName | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider pattern matching for readability
val decimalTypePrefixRegex = "^DecimalType.*".r
df.dtypes.collect {
case (columnName, "ByteType") => columnName
case (columnName, "IntegerType") => columnName
...
case (columnName, decimalTypePrefixRegex) => columnName
}
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
df.schema.map { field => | ||
(field.name, field.dataType) | ||
}.collect { | ||
case (columnName, ByteType | IntegerType | LongType | _: DecimalType) => | ||
columnName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to fix, just a remark that even with this improvement map is not really necessary (unless it causes issues with different Spark versions) if we collect with
case StructField(columnName, ByteType | IntegerType | LongType | _: DecimalType, _, _) => columnName
@wjxiz1992 could you take a look too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM.
This fixes #9198
It also