-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] pending requests should be cancelled when requesting tasks complete #1178
Comments
So there are four categories of shuffle block requests we could be cancelling: Client requests:
Server requests:
The lowest hanging fruit are client request that are waiting for bounce buffers: No resources have been acquired, and also no risk of a server having pending sends. So I think this part is doable today. This could cover a very large number of messages as well. Just thinking in terms of bounce buffer lengths.. we could have several bounce buffer lengths not ready to be received in the queue. For the rest of the situations I think we need that out-of-band message. The number of requests in these situations is going to be those that fit in the bounce buffers we have. The server is at a disadvantage because it doesn't know when tasks completed, so it does have
|
@abellina will this go into 21.08 or should we move it out of the release? |
Out of the original categories we handle (1) on task complete. This was introduced here #2553. This was the "low hanging fruit" as identified in a later comment. The rest of the categories are not handled yet. Changes for active messages in 21.08 help with this, since the receive side is not actively receiving, instead it has an active message waiting which receives dynamically. We are also now using a single id for the So the main task pending is to issue a |
Closing as won't fix for now |
…IDIA#1178) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Given: #1177, a better approach is to cancel pending requests for any task that has completed.
This means we need:
The text was updated successfully, but these errors were encountered: