Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffle services for merging partitions #30164
[SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffle services for merging partitions #30164
Changes from 6 commits
68fc364
2688df2
0423970
eb93fe1
f2f61e2
3a6219f
ca44d03
2172f65
1e76824
3e03109
cbb66eb
a1c8831
047ad0c
2d6d266
2b0c073
9ba4dfb
5127d8b
e320ac0
a2d85ef
affa8a0
46f5670
2467a61
050a5ae
1714829
1ba7668
5ce2934
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about only adding the merger location when we know there're no active executors on it. Since we'd prefer active executor locations first now, I think it's almost a redundant copy of the active executor locations normally.
The idea is we could add it when we find there're no active executors after
removeExecutor
and remove it when there're new executors register on the same host. This's definitely helpful for static resource allocation, although I'm a little bit hesitant about the possible overhead for dynamic resource allocation.WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a copy, but a materialized view of candidate hosts where external shuffle has been configured for the current application. It becomes a copy only when there is only 1 executor per host.
The cardinality of this map is, btw, low in comparison to total executors - given multi tenancy and given
maxRetainedMergerLocations
threshold.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I look forward to the day when the second condition will be disabled :-)
It will be relevant for both k8s and spark streaming !
+CC @dongjoon-hyun you might be interested in this in future.