Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JNI] Adds retryCount to RmmEventHandler.onAllocFailure #11940

Merged

Conversation

abellina
Copy link
Contributor

@abellina abellina commented Oct 18, 2022

This adds the method boolean onAllocFailure(long sizeRequested, int retryCount) to RmmEventHandler, to help handling code keep track of the number of times an allocation failure has been retried.

With this code callers can perform extra logic that depends on whether the callback was due to a brand new allocation failure, or one that has failed in the past and is being retried.

This will be used here: NVIDIA/spark-rapids#6768

@abellina abellina added Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS labels Oct 18, 2022
@abellina abellina requested a review from a team as a code owner October 18, 2022 16:01
@abellina abellina added non-breaking Non-breaking change feature request New feature or request labels Oct 18, 2022
Comment on lines 29 to 31
// this should not be called since it was the previous interface,
// and it was abstract before.
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it should not be called, we may throw an exception instead, so we can know that there was something wrongly called and reached here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can make this throw. Note that this is going away very soon, it is mainly to give us a few days while we update the rest of the code. The end result is that we'll remove these default implementations, and callers must use the non-deprecated code.

// newer code should override this implementation of `onAllocFailure` to handle
// the `isRetry` flag. Otherwise, we call the prior implementation to not
// break existing code.
return onAllocFailure(sizeRequested);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to just call the function above (which returns false all the time)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have code that implemented onAllocFailure (all existing code), no it just calls that impl. So this should be clearer given an exception in the deprecated code perhaps.

@abellina abellina changed the title [JNI] Adds isRetry to RmmEventHandler.onAllocFailure [JNI] Adds retryCount to RmmEventHandler.onAllocFailure Oct 18, 2022
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment nit but otherwise lgtm.

java/src/main/native/src/RmmJni.cpp Outdated Show resolved Hide resolved
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
@codecov
Copy link

codecov bot commented Oct 18, 2022

Codecov Report

Base: 87.40% // Head: 86.87% // Decreases project coverage by -0.53% ⚠️

Coverage data is based on head (cad4430) compared to base (f72c4ce).
Patch coverage: 88.48% of modified lines in pull request are covered.

❗ Current head cad4430 differs from pull request most recent head 40866b7. Consider uploading reports for the commit 40866b7 to get more accurate results

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12   #11940      +/-   ##
================================================
- Coverage         87.40%   86.87%   -0.54%     
================================================
  Files               133      133              
  Lines             21833    21905      +72     
================================================
- Hits              19084    19030      -54     
- Misses             2749     2875     +126     
Impacted Files Coverage Δ
python/cudf/cudf/core/udf/__init__.py 50.00% <ø> (ø)
python/cudf/cudf/io/orc.py 92.94% <ø> (-0.09%) ⬇️
python/cudf/cudf/testing/dataset_generator.py 72.83% <ø> (-0.42%) ⬇️
...thon/dask_cudf/dask_cudf/tests/test_distributed.py 18.86% <ø> (+4.94%) ⬆️
python/cudf/cudf/core/_base_index.py 82.20% <43.75%> (-3.35%) ⬇️
python/cudf/cudf/io/text.py 91.66% <66.66%> (-8.34%) ⬇️
python/strings_udf/strings_udf/__init__.py 84.31% <76.00%> (-12.57%) ⬇️
python/cudf/cudf/core/index.py 92.96% <95.71%> (+0.33%) ⬆️
python/cudf/cudf/__init__.py 90.69% <100.00%> (ø)
python/cudf/cudf/_lib/strings/__init__.py 100.00% <100.00%> (ø)
... and 19 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@abellina
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 6ca2ceb into rapidsai:branch-22.12 Oct 18, 2022
@abellina abellina deleted the oom/add_retry_count_in_rmm_jni branch October 18, 2022 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Java Affects Java cuDF API. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants