Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clear indication of non-GPU accelerated parameters in read_json docstring #11825

Merged

Conversation

GregoryKimball
Copy link
Contributor

@GregoryKimball GregoryKimball commented Sep 29, 2022

Description

This PR moves the "pandas engine only" arguments to the end of the optional argument list of the docstring.

This is the way an admonition will look like:
Screen Shot 2022-10-11 at 12 06 50 PM

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@GregoryKimball GregoryKimball requested a review from a team as a code owner September 29, 2022 18:11
@github-actions github-actions bot added the Python Affects Python cuDF API. label Sep 29, 2022
@GregoryKimball GregoryKimball added doc Documentation non-breaking Non-breaking change labels Sep 29, 2022
@GregoryKimball
Copy link
Contributor Author

Maybe these arguments should be marked (pandas engine only)?

typ : type of object to recover (series or frame), default 'frame'
    With cudf engine, only frame output is supported.
encoding : str, default is 'utf-8'
    The encoding to use to decode py3 bytes.
    With cudf engine, only utf-8 is supported.

What functionality exists here for the cudf engines?

compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'
    For on-the-fly decompression of on-disk data. If 'infer', then use
    gzip, bz2, zip or xz if path_or_buf is a string ending in
    '.gz', '.bz2', '.zip', or 'xz', respectively, and no decompression
    otherwise. If using 'zip', the ZIP file must contain only one data
    file to be read in. Set to None for no decompression.

@GregoryKimball GregoryKimball changed the base branch from branch-22.10 to branch-22.12 September 29, 2022 21:06
@codecov
Copy link

codecov bot commented Sep 30, 2022

Codecov Report

Base: 87.40% // Head: 88.11% // Increases project coverage by +0.70% 🎉

Coverage data is based on head (20818dd) compared to base (f72c4ce).
Patch coverage: 86.52% of modified lines in pull request are covered.

❗ Current head 20818dd differs from pull request most recent head 43dc303. Consider uploading reports for the commit 43dc303 to get more accurate results

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12   #11825      +/-   ##
================================================
+ Coverage         87.40%   88.11%   +0.70%     
================================================
  Files               133      133              
  Lines             21833    21881      +48     
================================================
+ Hits              19084    19280     +196     
+ Misses             2749     2601     -148     
Impacted Files Coverage Δ
python/cudf/cudf/core/udf/__init__.py 97.05% <ø> (+47.05%) ⬆️
python/cudf/cudf/io/orc.py 92.94% <ø> (-0.09%) ⬇️
...thon/dask_cudf/dask_cudf/tests/test_distributed.py 18.86% <ø> (+4.94%) ⬆️
python/cudf/cudf/core/_base_index.py 82.20% <43.75%> (-3.35%) ⬇️
python/cudf/cudf/io/text.py 91.66% <66.66%> (-8.34%) ⬇️
python/strings_udf/strings_udf/__init__.py 84.31% <76.00%> (-12.57%) ⬇️
python/cudf/cudf/core/index.py 92.91% <95.16%> (+0.28%) ⬆️
python/cudf/cudf/__init__.py 90.69% <100.00%> (ø)
python/cudf/cudf/core/column/categorical.py 89.34% <100.00%> (ø)
python/cudf/cudf/core/groupby/groupby.py 91.51% <100.00%> (+0.29%) ⬆️
... and 13 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions for clarification, no need to take as is though.

python/cudf/cudf/utils/ioutils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/ioutils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/ioutils.py Outdated Show resolved Hide resolved
Comment on lines 483 to 486
The first number is the offset in bytes, the second number is the range
size in bytes. Set the size to zero to read all data after the offset
location. Reads the row that starts before or at the end of the range,
even if it ends after the end of the range.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pair of `(offset, length)` specifying a subrange of the file to be read, in bytes.
To read from `offset` to the end of the file, set `length=0`. Reads the starting
before or at the end of the range even if it ends past the end of the range.

What does "at the end of the range" mean? I guess the byte range specifies a semi-open interval [offset, offset+length) does that mean if a row starts at offset + length - 1 then we read the entire row?

Aside: this is not a very ergonomic way of specifying "read from this offset to the end of the file". Could we accept either an offset int or an (offset, length) pair?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too needs a bit of re-work like you said. I'll try to address this in #11780

Comment on lines 497 to 499
For on-the-fly decompression of on-disk data. If 'infer', then use
gzip, bz2, zip or xz if path_or_buf is a string ending in
'.gz', '.bz2', '.zip', or 'xz', respectively, and no decompression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not use https://pypi.org/project/python-magic/ and just detect the appropriate decompression scheme?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we dynamically populate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that if infer is provided one could detect the actual file type not from the extension (brittle) but using magic (reasonably robust).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just read the email with this GH notification without the context of the thread. It sounded super viable.

python/cudf/cudf/utils/ioutils.py Show resolved Hide resolved
@galipremsagar galipremsagar changed the title Reorder read_json docstring and highlight pandas args Add clear indication of non-GPU accelerated parameters in read_json docstring Oct 11, 2022
python/cudf/cudf/utils/ioutils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/ioutils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/ioutils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/ioutils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/ioutils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/ioutils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/ioutils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/ioutils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/ioutils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/ioutils.py Outdated Show resolved Hide resolved
@galipremsagar galipremsagar added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Oct 13, 2022
@galipremsagar
Copy link
Contributor

@gpucibot merge

@rapids-bot rapids-bot bot merged commit c824fee into rapidsai:branch-22.12 Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge doc Documentation non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants