Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add docstring for dask_cudf.read_csv #8355

Merged
merged 6 commits into from
May 26, 2021
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions python/dask_cudf/dask_cudf/io/csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,52 @@


def read_csv(path, chunksize="256 MiB", **kwargs):
"""
Read CSV files into a dask_cudf.DataFrame

This API parallelizes the ``cudf.read_csv`` function in the following ways:

It supports loading many files at once using globstrings:

>>> import dask_cudf
>>> df = dask_cudf.read_csv("myfiles.*.csv")

In some cases it can break up large files:

>>> df = dask_cudf.read_csv("largefile.csv", chunksize="256 MiB")

It can read CSV files from external resources (e.g. S3, HTTP, FTP)

>>> df = dask_cudf.read_csv("s3://bucket/myfiles.*.csv")
>>> df = dask_cudf.read_csv("https://www.mycloud.com/sample.csv")

Internally ``dask_cudf.read_csv`` uses ``cudf.read_csv`` and supports
many of the same keyword arguments with the same performance guarantees.
See the docstring for ``cudf.read_csv()`` for more information on available
keyword arguments.

Parameters
----------
path : str, path object, or file-like object
Either a path to a file (a str, pathlib.Path, or
py._path.local.LocalPath), URL (including http, ftp, and S3 locations),
or any object with a read() method (such as builtin open() file
handler function or StringIO).
chunksize : int or str, default "256 MiB"
The target task partition size.
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
**kwargs : dict
Passthrough key-word arguments that are sent to ``cudf.read_csv``.

Examples
--------
>>> import dask_cudf
>>> ddf = dask_cudf.read_csv('sample.csv', usecols=['a', 'b'])
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
>>> ddf.compute()
a b
0 1 hi
1 2 hello
2 3 ai
"""
if "://" in str(path):
func = make_reader(cudf.read_csv, "read_csv", "CSV")
return func(path, blocksize=chunksize, **kwargs)
Expand Down