-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Do not fallback write files if output columns contain Spark internal metadata #4661
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
cc @JkSelf thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Catch. LGTM. Thanks.
@@ -234,6 +237,15 @@ class GlutenInsertSuite extends InsertSuite with GlutenSQLTestsBaseTrait { | |||
} | |||
} | |||
} | |||
|
|||
testGluten("Do not fallback write files if output columns contain Spark internal metadata") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulysses-you Which INTERNAL_METADATA_KEYS this suite contains? Can we add some comments here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this test, it's __autoGeneratedAlias
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
What changes were proposed in this pull request?
If the metadata in attribute is leaked by Spark itself, we should not make write files fallback. This pr does cleanup spark internal metadata manually. See apache/spark#40776.
How was this patch tested?
add tests