-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32002][SQL]Support ExtractValue from nested ArrayStruct #28860
Conversation
Test build #124217 has finished for PR 28860 at commit
|
Test build #124220 has finished for PR 28860 at commit
|
Test build #124252 has finished for PR 28860 at commit
|
retest this please |
Test build #124268 has finished for PR 28860 at commit
|
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Outdated
Show resolved
Hide resolved
Seems this way will change the schema, maybe we should add a new class to get multidimensional array form GetArrayStruct |
Co-authored-by: Ruslan Dautkhanov <Tagar@users.noreply.github.com>
Test build #124314 has finished for PR 28860 at commit
|
retest this please |
cc @viirya too FYI |
@@ -59,6 +59,23 @@ object ExtractValue { | |||
GetArrayStructFields(child, fields(ordinal).copy(name = fieldName), | |||
ordinal, fields.length, containsNull) | |||
|
|||
case (ExtractNestedArray(StructType(fields), containsNull, containsNullSeq), | |||
NonNullLiteral(v, StringType)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
case (ExtractNestedArray(
StructType(fields), containsNull, containsNullSeq), NonNullLiteral(v, StringType)) =>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also update the documentation and table above.
@@ -95,6 +112,50 @@ object ExtractValue { | |||
|
|||
trait ExtractValue extends Expression | |||
|
|||
object ExtractNestedArray { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add documentation here.
@@ -95,6 +112,50 @@ object ExtractValue { | |||
|
|||
trait ExtractValue extends Expression | |||
|
|||
object ExtractNestedArray { | |||
|
|||
type ReturnType = Option[(DataType, Boolean, Seq[Boolean])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add some comments for what this type means.
|
||
def extractArrayStruct(expr: Expression): ReturnType = { | ||
expr match { | ||
case gas @ GetArrayStructFields(child, _, _, _, _) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid arguments matching. This is actually an anti pattern - https://github.com/databricks/scala-style-guide#pattern-matching
expr match { | ||
case gas @ GetArrayStructFields(child, _, _, _, _) => | ||
extractArrayStruct(child) match { | ||
case Some((e, deep)) => Some(e, deep + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deep
-> depth
?
extractArrayType(dataType) | ||
} | ||
|
||
def extractArrayType(dataType: DataType): ReturnType = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we combine this and unapply
?
|
||
checkAnswer(sql( | ||
""" | ||
|SELECT a.b.c FROM nest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add deeper cases? Also, you can simplify the test cases, for example, as below:
val df = spark.range(10).select(array(struct(array(struct("id")).alias("col1"))).alias("col0"))
df.selectExpr("col0.col1.id")
@@ -95,6 +112,50 @@ object ExtractValue { | |||
|
|||
trait ExtractValue extends Expression | |||
|
|||
object ExtractNestedArray { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also name it something like ExtractNestedArrayType
Test build #124544 has finished for PR 28860 at commit
|
org.apache.spark.sql.execution.ProjectionOverSchema.scala. A Scala extractor that projects an expression over a given schema. I think this code also needs to have matching rules added to it. To support nested ArrayStruct. Otherwise, ParquetSchemaPruning won't work either. |
Yea, right, current way change the schema and won't work well for all case, I am thinking about add a new class to handle nested ArrayStruct type |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
For data
nest.json
run with
will got error
This is because after resolve
a.b
, it constructGetArrayStructField
froma#
result with type
ArrayType(ArrayType(StructType))
, then it can't be extract through currentExtractValue
method.Support extract
StructField
form nested Array-Struct combinationWhy are the changes needed?
Support more use case
Does this PR introduce any user-facing change?
Yes, users will be able to select from nested array and structs.
How was this patch tested?
Added UT