-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a hanging issue when processing empty data. #841
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
build |
revans2
previously approved these changes
Sep 24, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work finding this.
build |
The output iterator will wait on the batch queue when calling `hasNext`, and suppose to be waked up when the Python runner inserts something into the batch queue. But the insertion will never happen if the input data is empty. So it hangs forever. The solution is to let the Python runner always wake up the output iterator after it finishes the data writing by calling the new added API `finish()`. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
The 'small_data' is small enough to let some tasks get no data when running. Now only test this for the Scalar type who just implements the columnar pipeline. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
firestarman
force-pushed
the
fix-hang-issue
branch
from
September 25, 2020 01:17
2b334cd
to
fa7d545
Compare
firestarman
changed the title
[WIP] Fix a hanging issue when processing empty data.
Fix a hanging issue when processing empty data.
Sep 25, 2020
build |
@revans2 Added the test for it. Could you take another look? |
build |
1 similar comment
build |
revans2
approved these changes
Sep 25, 2020
NvTimLiu
pushed a commit
to NvTimLiu/spark-rapids
that referenced
this pull request
Oct 16, 2020
* Fix a hanging issue when processing empty data. The output iterator will wait on the batch queue when calling `hasNext`, and suppose to be waked up when the Python runner inserts something into the batch queue. But the insertion will never happen if the input data is empty. So it hangs forever. The solution is to let the Python runner always wake up the output iterator after it finishes the data writing by calling the new added API `finish()`. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Add tests for processing empty data. The 'small_data' is small enough to let some tasks get no data when running. Now only test this for the Scalar type who just implements the columnar pipeline. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
sperlingxx
pushed a commit
to sperlingxx/spark-rapids
that referenced
this pull request
Nov 20, 2020
* Fix a hanging issue when processing empty data. The output iterator will wait on the batch queue when calling `hasNext`, and suppose to be waked up when the Python runner inserts something into the batch queue. But the insertion will never happen if the input data is empty. So it hangs forever. The solution is to let the Python runner always wake up the output iterator after it finishes the data writing by calling the new added API `finish()`. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Add tests for processing empty data. The 'small_data' is small enough to let some tasks get no data when running. Now only test this for the Scalar type who just implements the columnar pipeline. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
nartal1
pushed a commit
to nartal1/spark-rapids
that referenced
this pull request
Jun 9, 2021
* Fix a hanging issue when processing empty data. The output iterator will wait on the batch queue when calling `hasNext`, and suppose to be waked up when the Python runner inserts something into the batch queue. But the insertion will never happen if the input data is empty. So it hangs forever. The solution is to let the Python runner always wake up the output iterator after it finishes the data writing by calling the new added API `finish()`. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Add tests for processing empty data. The 'small_data' is small enough to let some tasks get no data when running. Now only test this for the Scalar type who just implements the columnar pipeline. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
nartal1
pushed a commit
to nartal1/spark-rapids
that referenced
this pull request
Jun 9, 2021
* Fix a hanging issue when processing empty data. The output iterator will wait on the batch queue when calling `hasNext`, and suppose to be waked up when the Python runner inserts something into the batch queue. But the insertion will never happen if the input data is empty. So it hangs forever. The solution is to let the Python runner always wake up the output iterator after it finishes the data writing by calling the new added API `finish()`. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Add tests for processing empty data. The 'small_data' is small enough to let some tasks get no data when running. Now only test this for the Scalar type who just implements the columnar pipeline. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
tgravescs
pushed a commit
to tgravescs/spark-rapids
that referenced
this pull request
Nov 30, 2023
…IDIA#841) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com> Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The output iterator will wait on the batch queue when calling
hasNext
, and suppose to be waked up when the Python runner inserts something into the batch queue. But the insertion will never happen if the input data is empty. So it hangs forever.The solution is to let the Python runner always wake up the output iterator after it finishes the data writing by calling the new added API
finish()
.Also added the test for it. The 'small_data' is small enough to let some tasks get no data when running.
Signed-off-by: Firestarman firestarmanllc@gmail.com