Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework Scalar imports #10791

Merged
merged 4 commits into from
May 10, 2022

Conversation

brandon-b-miller
Copy link
Contributor

This PR changes the way things are imported in our scalar code such that it's less prone to circular import issues. For instance before this we can't make NA the default value of a kwarg for, say, anything in column.py.

@brandon-b-miller brandon-b-miller added 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 4, 2022
@brandon-b-miller brandon-b-miller requested a review from a team as a code owner May 4, 2022 19:57
@codecov
Copy link

codecov bot commented May 4, 2022

Codecov Report

Merging #10791 (6aa2de2) into branch-22.06 (027c34a) will decrease coverage by 0.11%.
The diff coverage is 94.11%.

@@               Coverage Diff                @@
##           branch-22.06   #10791      +/-   ##
================================================
- Coverage         86.40%   86.29%   -0.12%     
================================================
  Files               143      144       +1     
  Lines             22444    22491      +47     
================================================
+ Hits              19393    19408      +15     
- Misses             3051     3083      +32     
Impacted Files Coverage Δ
python/cudf/cudf/core/scalar.py 89.01% <90.90%> (-0.31%) ⬇️
python/cudf/cudf/__init__.py 90.69% <100.00%> (+0.22%) ⬆️
python/cudf/cudf/core/missing.py 100.00% <100.00%> (ø)
python/cudf/cudf/core/udf/typing.py 97.54% <100.00%> (ø)
...thon/dask_cudf/dask_cudf/tests/test_distributed.py 18.86% <0.00%> (-67.93%) ⬇️
python/cudf/cudf/core/indexed_frame.py 91.70% <0.00%> (ø)
python/cudf/cudf/testing/_utils.py 94.05% <0.00%> (+0.06%) ⬆️
python/cudf/cudf/core/dataframe.py 93.77% <0.00%> (+0.08%) ⬆️
python/cudf/cudf/core/reshape.py 90.11% <0.00%> (+0.11%) ⬆️
python/cudf/cudf/core/column/string.py 89.21% <0.00%> (+0.12%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6128e0d...6aa2de2. Read the comment docs.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One request that can be addressed in a follow-up, otherwise LGTM.

isinstance(other, np.generic) or other.is_valid
)
if not valid:
return Scalar(None, dtype=out_dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize it's not in scope, so I'd be perfectly happy with a follow-up, but this seems like an important case to have tested somewhere. It's basically for any binary op between two scalars where one is NA, right? I'm surprised that's not tested, I guess because we only test column + scalar?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Approving but would also like to see tests for this at some point.)

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I'd leave a review summary but then I was like NA.

python/cudf/cudf/core/missing.py Outdated Show resolved Hide resolved
python/cudf/cudf/__init__.py Outdated Show resolved Hide resolved
from cudf.core.dtypes import ListDtype, StructDtype
from cudf.core.index import BaseIndex
from cudf.core.missing import NA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since cudf is already imported, do we want to use cudf.NA rather than import this name? I see quite a few files that use cudf.NA rather than importing NA separately. I don't think any other files use this convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really great question. We use cudf.NA almost everywhere, but should we? In some ways it feels like it might fit the "onion model" a bit better if scalar.py didn't mention the top level cudf namespace anywhere.

I think there's probably plenty of places I have used cudf.NA throughout the codebase as a convenience since those places already have cudf imported, but I realize now that it has the negative consequence of further entrenching the need for import cudf in those places, which is part of what I think gives rise to these import issues.

Is this way off base? I am happy to put in a PR that addresses this for NA across the codebase as a follow up if that's what we think should happen, and just imports from missing.

Copy link
Contributor

@vyasr vyasr May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the long run I would love to get rid of import cudf almost everywhere. It is a code smell that almost always (there are always exceptions) indicates circular dependencies that should not exist. Unfortunately a lot of those circular dependencies are baked deeply into our code right now and are hard to excise. If you could get rid of these on a case-by-case basis (in this instance, for cudf.NA), I think that would be good for us in the long run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @vyasr I think that lines up with my understanding. I am going to leave this as-is for now and look into making the change everywhere. I'll leave this thread open though and let @bdice resolve if this conclusion seems satisfactory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing import cudf sounds fine to me! Could we make an issue to document the plan before resolving this? (I would do it but I’m on mobile at the moment.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised #10820 regarding this. I'll tackle this for NA first after this PR is merged, after which it should be importable from missing. From there I'll start probing what it takes to address this more generally.

python/cudf/cudf/core/missing.py Outdated Show resolved Hide resolved
brandon-b-miller and others added 3 commits May 9, 2022 11:39
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — would like to see a test added (in a later PR, perhaps) and an issue filed for the plan to remove internal uses of import cudf.

isinstance(other, np.generic) or other.is_valid
)
if not valid:
return Scalar(None, dtype=out_dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Approving but would also like to see tests for this at some point.)

@brandon-b-miller
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 19c5bad into rapidsai:branch-22.06 May 10, 2022
rapids-bot bot pushed a commit that referenced this pull request May 10, 2022
…0821)

This PR changes cuDF so `NA` isn't used around the codebase from the top level `cudf` namespace and rather is imported directly from `missing`. This is part of #10820 and comes as a follow up to #10791 (comment)

Authors:
  - https://github.com/brandon-b-miller

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #10821
rapids-bot bot pushed a commit that referenced this pull request May 11, 2022
Add tests for binaryops between two null scalars as per #10791 (comment)

This is technically `1307` tests but they only take about 5 seconds to run.

Authors:
  - https://github.com/brandon-b-miller

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #10828
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants