-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26164][SQL][FOLLOWUP] WriteTaskStatsTracker should know which file the row is written to #32459
Conversation
cc @c21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Kubernetes integration test starting |
Kubernetes integration test status failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Test build #138216 has finished for PR 32459 at commit
|
@@ -151,7 +151,7 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration) | |||
} | |||
} | |||
|
|||
override def newRow(row: InternalRow): Unit = { | |||
override def newRow(filePath: String, row: InternalRow): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Apache Side codebase, this is no-op unused parameter and there is no test coverage in this PR. Do you think we can have a sample custom WriteTaskStatsTracker
test case to prevent a future regression, @cloud-fan ?
To not break some custom WriteTaskStatsTracker implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for adding a test suite, @cloud-fan .
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #138233 has finished for PR 32459 at commit
|
thanks for review, merging to master! |
What changes were proposed in this pull request?
This is a follow-up of #32198
Before #32198, in
WriteTaskStatsTracker.newRow
, we know that the row is written to the current file. After #32198 , we no longer know this connection.This PR adds the file path parameter in
WriteTaskStatsTracker.newRow
to bring back the connection.Why are the changes needed?
To not break some custom
WriteTaskStatsTracker
implementations.Does this PR introduce any user-facing change?
no
How was this patch tested?
N/A