-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] cudf v22.12 #12200
[RELEASE] cudf v22.12 #12200
Commits on Sep 23, 2022
-
Merge pull request #11757 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for ba9c43c - Browse repository at this point
Copy the full SHA ba9c43cView commit details -
Merge pull request #11758 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 7376f1f - Browse repository at this point
Copy the full SHA 7376f1fView commit details
Commits on Sep 24, 2022
-
Merge pull request #11763 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 59847c1 - Browse repository at this point
Copy the full SHA 59847c1View commit details
Commits on Sep 26, 2022
-
Merge pull request #11767 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 41474af - Browse repository at this point
Copy the full SHA 41474afView commit details -
Merge pull request #11773 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 5fb657d - Browse repository at this point
Copy the full SHA 5fb657dView commit details -
Merge pull request #11774 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 2b94483 - Browse repository at this point
Copy the full SHA 2b94483View commit details -
Merge pull request #11775 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 7d40b30 - Browse repository at this point
Copy the full SHA 7d40b30View commit details -
Merge pull request #11776 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for a1cbb02 - Browse repository at this point
Copy the full SHA a1cbb02View commit details -
Merge pull request #11777 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for aa2ef0e - Browse repository at this point
Copy the full SHA aa2ef0eView commit details
Commits on Sep 27, 2022
-
Merge pull request #11781 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for cc6f237 - Browse repository at this point
Copy the full SHA cc6f237View commit details -
Merge pull request #11782 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for cc97584 - Browse repository at this point
Copy the full SHA cc97584View commit details -
Merge pull request #11784 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 1d7af9e - Browse repository at this point
Copy the full SHA 1d7af9eView commit details -
Merge pull request #11786 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for b8ab576 - Browse repository at this point
Copy the full SHA b8ab576View commit details
Commits on Sep 28, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 54480f3 - Browse repository at this point
Copy the full SHA 54480f3View commit details -
Configuration menu - View commit details
-
Copy full SHA for f72c4ce - Browse repository at this point
Copy the full SHA f72c4ceView commit details -
Merge pull request #11801 from davidwendt/branch-22.12-merge-22.10
Merge branch-22.10 into branch-22.12
Configuration menu - View commit details
-
Copy full SHA for 017d85f - Browse repository at this point
Copy the full SHA 017d85fView commit details -
Merge pull request #11805 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 479514e - Browse repository at this point
Copy the full SHA 479514eView commit details -
Merge pull request #11806 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 5cf7fdf - Browse repository at this point
Copy the full SHA 5cf7fdfView commit details -
Merge pull request #11809 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 97353fc - Browse repository at this point
Copy the full SHA 97353fcView commit details -
Merge pull request #11810 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 69a031c - Browse repository at this point
Copy the full SHA 69a031cView commit details
Commits on Sep 29, 2022
-
Merge pull request #11820 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 90afe92 - Browse repository at this point
Copy the full SHA 90afe92View commit details -
Fix compile warning from CUDF_FUNC_RANGE in a member function (#11798)
Compile warning was introduced in #11652 in `bgzip_data_chunk_source.cu`. The warning can be seen here https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-cpu-cuda-build/CUDA=11.5/12417/consoleFull (search for `177-D`) ``` /cudf/cpp/src/io/text/bgzip_data_chunk_source.cu(362): warning #177-D: variable "nvtx3_range__" was declared but never referenced ``` The `nvtx3_range__` is part of the `CUDF_FUNC_RANGE()` macro. The warning is incorrect and likely a compiler bug. The workaround in this PR is to add `[[maybe_unused]]` to the variable declaration. I was not able to create a small reproducer for compile bug filing. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Tobias Ribizel (https://github.com/upsj) - MithunR (https://github.com/mythrocks) URL: #11798
Configuration menu - View commit details
-
Copy full SHA for ec4cdd8 - Browse repository at this point
Copy the full SHA ec4cdd8View commit details -
Merge pull request #11821 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 87d0387 - Browse repository at this point
Copy the full SHA 87d0387View commit details -
Merge pull request #11823 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 0ecbaa1 - Browse repository at this point
Copy the full SHA 0ecbaa1View commit details -
Merge pull request #11829 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 59ce915 - Browse repository at this point
Copy the full SHA 59ce915View commit details -
Merge pull request #11830 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for c8c9027 - Browse repository at this point
Copy the full SHA c8c9027View commit details -
Merge pull request #11831 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 3c9f9cf - Browse repository at this point
Copy the full SHA 3c9f9cfView commit details -
Merge pull request #11832 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for cb81ebc - Browse repository at this point
Copy the full SHA cb81ebcView commit details
Commits on Sep 30, 2022
-
Merge pull request #11839 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 71167d7 - Browse repository at this point
Copy the full SHA 71167d7View commit details
Commits on Oct 3, 2022
-
Merge pull request #11851 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 8df6dbf - Browse repository at this point
Copy the full SHA 8df6dbfView commit details -
Merge pull request #11852 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 5000e94 - Browse repository at this point
Copy the full SHA 5000e94View commit details -
Remove
cudf_io
namespace alias (#11827)Some cuIO tests and benchmarks declare `cudf_io` alias for `cudf::io`. This saves a single letter so it's considered to be of very low utility. Removing all occurrences of this alias. Also removed a couple of builder calls where the option was being set to default value. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: #11827
Configuration menu - View commit details
-
Copy full SHA for 0b28d34 - Browse repository at this point
Copy the full SHA 0b28d34View commit details
Commits on Oct 4, 2022
-
Test/remove thrust vector usage (#11813)
This PR removes usage of `thrust::device_vector` from almost all of our tests. Since the construction of a device vector is not stream-ordered, we should be using `rmm::device_uvector` instead wherever possible. There is one remaining use of `thrust::device_vector`, but that is in an test explicitly verifying that `device_vector` can convert implicitly to a `device_span` so it's worth keeping that there. I am working on automated tooling to detect any usage of stream 0 in tests as part of a push to prioritize stream-safety in libcudf, and this PR is a prerequisite to adding such tooling to our CI pipeline since at that point any test using stream 0 would fail. Since there is at least one test where I anticipate stream 0 will always be used (the one described above), I should be able to add specific tests to an allowlist as needed. It's an open question whether the added complexity required by the changes in this PR is a worthwhile tradeoff to be able to programmatically detect stream 0 usage. If reviewers feel that the additional complexity is too high, we can revert some (or all) of these changes and I can just plan for allowing stream 0 usage in all of the necessary tests. This PR demonstrates how we would go about removing it if we choose to do so, though. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Tobias Ribizel (https://github.com/upsj) - Nghia Truong (https://github.com/ttnghia) URL: #11813
Configuration menu - View commit details
-
Copy full SHA for ba0febe - Browse repository at this point
Copy the full SHA ba0febeView commit details -
Use conda-forge's
pyorc
(#11855)This PR switches the `pyorc` install from a `pip` wheel to a `conda` package. xref: #7085 (comment) Authors: - https://github.com/jakirkham Approvers: - Jordan Jacobelli (https://github.com/Ethyling) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #11855
Configuration menu - View commit details
-
Copy full SHA for 5e42c2d - Browse repository at this point
Copy the full SHA 5e42c2dView commit details -
Update cudf JNI version to 22.12.0-SNAPSHOT (#11764)
Update JNI version to 22.12.0-SNAPSHOT Authors: - Peixin (https://github.com/pxLi) Approvers: - Nghia Truong (https://github.com/ttnghia) - Robert (Bobby) Evans (https://github.com/revans2) URL: #11764
Configuration menu - View commit details
-
Copy full SHA for 7d173c9 - Browse repository at this point
Copy the full SHA 7d173c9View commit details -
Remove unused includes for table/row_operators (#11857)
After reviewing usages of the "legacy" row operators, several of the includes are no longer needed. Authors: - Gregory Kimball (https://github.com/GregoryKimball) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: #11857
Gregory Kimball authoredOct 4, 2022 Configuration menu - View commit details
-
Copy full SHA for 0fb4d76 - Browse repository at this point
Copy the full SHA 0fb4d76View commit details
Commits on Oct 5, 2022
-
Merge pull request #11866 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for 0d38a78 - Browse repository at this point
Copy the full SHA 0d38a78View commit details -
JNI Avoid NPE for reading host binary data (#11865)
This avoids a potential null pointer exception when trying to read byte data from an empty column Authors: - Robert (Bobby) Evans (https://github.com/revans2) Approvers: - Nghia Truong (https://github.com/ttnghia) URL: #11865
Configuration menu - View commit details
-
Copy full SHA for 001aede - Browse repository at this point
Copy the full SHA 001aedeView commit details -
Unpin
dask
anddistributed
for development (#11859)This PR relaxes the pinnings of `dask` and `distributed` for `22.12` development. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Joseph (https://github.com/jolorunyomi) - https://github.com/jakirkham URL: #11859
Configuration menu - View commit details
-
Copy full SHA for 6d18543 - Browse repository at this point
Copy the full SHA 6d18543View commit details -
Parquet reader: bug fix for a num_rows/skip_rows corner case, w/optim…
…ization for nested preprocessing (#11752) Fixes an issue where using user bounds with parquet files containing both nested and non-nested types could result in incorrect row counts for the non-nested columns. Originally reported by @etseidl The nature of the fix also implements a longstanding desired optimization: when running the preprocess step for nested types, ignore pages for non-nested hierarchies. This can result in significant speedups for files containing only a few nested columns. <s>The tests added for this PR seem to tease a bug in the parquet writer into happening (#11748) so I will leave this as a draft until that issue is resolved.</s> Authors: - https://github.com/nvdbaranec Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) - Mike Wilson (https://github.com/hyperbolic2346) URL: #11752
Configuration menu - View commit details
-
Copy full SHA for 4525474 - Browse repository at this point
Copy the full SHA 4525474View commit details
Commits on Oct 6, 2022
-
Fix RangeIndex unary operators. (#11868)
These operators rely on a method that was renamed in #11272 and are also out of sync with the rest of the `RangeIndex` design now that the `__getattr__` overload has been removed (#10538). Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #11868
Configuration menu - View commit details
-
Copy full SHA for 029b1db - Browse repository at this point
Copy the full SHA 029b1dbView commit details -
Fix make_column_from_scalar for all-null strings column (#11807)
Fixes the `cudf::make_column_from_scalar` for an invalid `cudf::string_scalar` to return a column with children. Some libcudf APIs will not work with a strings column with no children. This condition would be rare enough that additional logic for checking no children in these places would be a performance and maintenance issue. This also greatly simplifies the `make_column_from_scalar` specialization logic for strings. Closes #11756 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Bradley Dice (https://github.com/bdice) URL: #11807
Configuration menu - View commit details
-
Copy full SHA for e323f0a - Browse repository at this point
Copy the full SHA e323f0aView commit details -
Fix decimal benchmark input data generation (#11863)
closes #11850 Fixes decimal benchmark input data generation. Generated data alternated between two values because `device_uvector<T>` has both value and scale. scale is fixed for a column and hence when this data is copied to `cudf::column`, this column values alternated between values and scale. Fix is to use `device_storage_type_t<T>` instead of `T`. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) URL: #11863
Configuration menu - View commit details
-
Copy full SHA for 1ef722d - Browse repository at this point
Copy the full SHA 1ef722dView commit details -
part1: Simplify BaseIndex to an abstract class (#10389)
This PR is in response to @vyasr comment, as partial fix for PR #9593 : `BaseIndex `should be reduced as closely as possible to an abstract class. While there are a subset of APIs that truly make sense for all types of index objects, in almost all cases the optimal implementation for `RangeIndex `(and `MultiIndex`, for that matter) is very different from the implementation for `GenericIndex`. In addition, this change reduces cognitive load for developers by simplifying the inheritance hierarchy Authors: - Sheilah Kirui (https://github.com/skirui-source) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Vyas Ramasubramani (https://github.com/vyasr) URL: #10389
Configuration menu - View commit details
-
Copy full SHA for e20eb94 - Browse repository at this point
Copy the full SHA e20eb94View commit details
Commits on Oct 7, 2022
-
Merge pull request #11876 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for eb0e4b6 - Browse repository at this point
Copy the full SHA eb0e4b6View commit details -
Add BGZIP reader to python
read_text
(#11802)Adds the missing integration, plus some tests. I decided to extend the `read_text` interface rather than add a new one. For details on the bgzip format, see #11652 Authors: - Tobias Ribizel (https://github.com/upsj) Approvers: - Ashwin Srinath (https://github.com/shwina) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #11802
Configuration menu - View commit details
-
Copy full SHA for 4c4acd5 - Browse repository at this point
Copy the full SHA 4c4acd5View commit details
Commits on Oct 8, 2022
-
Merge pull request #11881 from rapidsai/branch-22.10
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Configuration menu - View commit details
-
Copy full SHA for fc5b675 - Browse repository at this point
Copy the full SHA fc5b675View commit details
Commits on Oct 10, 2022
-
Add BGZIP multibyte_split benchmark (#11723)
This refactors #11652 to extract the BGZIP IO and adds another `source_type` to the `multibyte_split` benchmark, creating a compressed file using `zlib`. A quick benchmark shows performance results around 2.5x slower than reading from a device buffer at around 1:5 compression ratio ### [0] Tesla T4 | source_type | delim_size | delim_percent | size_approx | byte_range_percent | Time | Peak Memory Usage | Encoded file size | |-------------|------------|---------------|-------------------|--------------------|------------|-------------------|-------------------| | bgzip | 1 | 1 | 2^30 = 1073741824 | 100 | 507.479 ms | 4.022 GiB | 1006.638 MiB | | file | 1 | 1 | 2^30 = 1073741824 | 100 | 339.860 ms | 3.947 GiB | 1006.638 MiB | | device | 1 | 1 | 2^30 = 1073741824 | 100 | 201.556 ms | 3.947 GiB | 1006.638 MiB | Authors: - Tobias Ribizel (https://github.com/upsj) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) - Jordan Jacobelli (https://github.com/Ethyling) URL: #11723
Configuration menu - View commit details
-
Copy full SHA for 4eb9c6c - Browse repository at this point
Copy the full SHA 4eb9c6cView commit details -
Fix pre-commit copyright check (#11860)
This PR improves the copyright check script to handle cases where the ancestor `branch-*` does not have an upstream set. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) - Jake Awe (https://github.com/AyodeAwe) URL: #11860
Configuration menu - View commit details
-
Copy full SHA for 586907b - Browse repository at this point
Copy the full SHA 586907bView commit details -
Remove "experimental" warning for struct columns in ORC reader and wr…
…iter (#11880) Closes #11484 Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Bradley Dice (https://github.com/bdice) - https://github.com/nvdbaranec URL: #11880
Configuration menu - View commit details
-
Copy full SHA for 5b51591 - Browse repository at this point
Copy the full SHA 5b51591View commit details
Commits on Oct 11, 2022
-
ArrowIPCTableWriter writes en empty batch in the case of an empty tab…
…le. (#11883) closes #11882 Updated the `ArrowIPCTableWriter` to write en empty batch explicitly in the case of an empty table, because the Arrow IPC writer will write no batches out for this case, leading to an error as below when calling the `Pyarrow.Table.from_batches` without specifying a schema. ``` E File "pyarrow/table.pxi", line 1609, in pyarrow.lib.Table.from_batches E ValueError: Must pass schema, or at least one RecordBatch ``` Signed-off-by: Liangcai Li <firestarmanllc@gmail.com> Authors: - Liangcai Li (https://github.com/firestarman) Approvers: - Nghia Truong (https://github.com/ttnghia) URL: #11883
Configuration menu - View commit details
-
Copy full SHA for 26f3e76 - Browse repository at this point
Copy the full SHA 26f3e76View commit details -
Conform "bench_isin" to match generator column names (#11549)
The version of `bench_isin` merged in #11125 used key and column names of the format `f"key{i}"` rather than the format `f"{string.ascii_lowercase[i]}"` as is used in the dataframe generator. As a result the `isin` benchmark using a dictionary argument short-circuits with no matching keys, and the `isin` benchmark using a dataframe argument finds no matches. This PR also adjusts the `isin` arguments from `range(1000)` to `range(50)` to better match the input dataframe cardinality of 100. With `range(1000)`, every element matches but with `range(50)` only 50% of the elements match. Authors: - Gregory Kimball (https://github.com/GregoryKimball) Approvers: - Bradley Dice (https://github.com/bdice) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #11549
Gregory Kimball authoredOct 11, 2022 Configuration menu - View commit details
-
Copy full SHA for 566b3d1 - Browse repository at this point
Copy the full SHA 566b3d1View commit details -
Use public APIs in STREAM_COMPACTION_NVBENCH (#11892)
Use `state.set_cuda_stream` to set the stream for the nvbench benchmark. Then run `state.exec` on the public API instead of the detail API, e.g. `cudf::distinct` instead of `cudf::detail::distinct`. Authors: - Gregory Kimball (https://github.com/GregoryKimball) Approvers: - Nghia Truong (https://github.com/ttnghia) - Karthikeyan (https://github.com/karthikeyann) URL: #11892
Gregory Kimball authoredOct 11, 2022 Configuration menu - View commit details
-
Copy full SHA for 9ba6142 - Browse repository at this point
Copy the full SHA 9ba6142View commit details -
Error on
ListColumn
or any new unsupported column incudf.Index
(#……11902) This PR raises a `NotImplementedError` for `ListColumn` or any new column that isn't supported by `cudf.Index` yet. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: #11902
Configuration menu - View commit details
-
Copy full SHA for a921f5d - Browse repository at this point
Copy the full SHA a921f5dView commit details -
Add coverage for string UDF tests. (#11891)
Many PRs are currently showing Codecov patch status check failures that appear to be the result of not uploading coverage reports for the string UDF tests. This PR should enable the missing coverage and ensure that we are actually measuring coverage of these code paths. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - https://github.com/brandon-b-miller - Jake Awe (https://github.com/AyodeAwe) URL: #11891
Configuration menu - View commit details
-
Copy full SHA for 7032cc3 - Browse repository at this point
Copy the full SHA 7032cc3View commit details -
Adds the `GroupBy.ngroup()` method. Closes #11848 Authors: - Ashwin Srinath (https://github.com/shwina) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #11871
Configuration menu - View commit details
-
Copy full SHA for 387192c - Browse repository at this point
Copy the full SHA 387192cView commit details
Commits on Oct 12, 2022
-
Change expect_strings_empty into expect_column_empty libcudf test uti…
…lity (#11873) Moves the `cudf::test::expect_strings_empty` utility from `cpp/tests/strings` to more generic function `cudf::test::expect_column_empty` Reference #11734 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Vukasin Milovanovic (https://github.com/vuule) - Tobias Ribizel (https://github.com/upsj) URL: #11873
Configuration menu - View commit details
-
Copy full SHA for ccbd852 - Browse repository at this point
Copy the full SHA ccbd852View commit details -
Relax
codecov
threshold diff (#11899)This PR relaxes `codecov` threshold which will allow CI checks to pass(though it's optional to merge). Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #11899
Configuration menu - View commit details
-
Copy full SHA for 75a6973 - Browse repository at this point
Copy the full SHA 75a6973View commit details -
Fix memcheck error in TypeInference.Timestamp gtest (#11905)
Fixes an error in the `TypeInference.Timestamp` gtest where the `size` parameter was incorrect. This error was found by the nightly builds and could be recreated using ``` compute-sanitizer --tool memcheck gtests/TYPE_INFERENCE_TEST --gtest_filter=TypeInference.Timestamp --rmm_mode=cuda ``` Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: #11905
Configuration menu - View commit details
-
Copy full SHA for 8b5ab23 - Browse repository at this point
Copy the full SHA 8b5ab23View commit details -
Fix memcheck error in get_dremel_data (#11903)
Fixes logic that applies offsets to nested column children to not write past the end of the offsets vector. This error was found by the nightly builds and could be recreated using ``` compute-sanitizer --tool memcheck gtests/PARQUET_TEST --gtest_filter=ParquetReaderTest.NestedByteArray --rmm_mode=cuda ``` Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Vukasin Milovanovic (https://github.com/vuule) - Tobias Ribizel (https://github.com/upsj) URL: #11903
Configuration menu - View commit details
-
Copy full SHA for 3226859 - Browse repository at this point
Copy the full SHA 3226859View commit details
Commits on Oct 13, 2022
-
Add thrust output iterator fix (1805) to thrust.patch (#11900)
Adds fix from NVIDIA/thrust#1805 to libcudf's `thrust.patch` Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Mark Harris (https://github.com/harrism) - Vyas Ramasubramani (https://github.com/vyasr) URL: #11900
Configuration menu - View commit details
-
Copy full SHA for 0ca68c7 - Browse repository at this point
Copy the full SHA 0ca68c7View commit details -
Fix segmented-sort to ignore indices outside the offsets (#11888)
Fixes `cudf::segmented_sorted_order` to ignore indices outside the specified offsets values. The segmented-sort function in general sorts subsets of the input using a column of offsets (integers) to identify the position of each segment. Here is an example: ``` input = { 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 } offsets1 = { 0, 3, 7, 10 } ``` There are 3 segments to sort: `[0,3)`, `[3,7)`, and `[7,10)` Segment 1 sorts to `{ 7, 8, 9 }` Segment 2 sorts to `{ 3, 4, 5, 6 }` Segment 3 sorts to `{ 0, 1, 2 }` The segmented-sort result is `{ 7, 8, 9, 3, 4, 5, 6, 0, 1, 2 }` If the offsets do not fully cover all the input the segmented-sort should ignore any segments outside of the offsets. ``` input = { 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 } offsets2 = { 3, 7 } ``` Here there is only 1 segments to sort: `[3,7) => { 3, 4, 5, 6 }` The segmented-sort result is `{ 9, 8, 7, 3, 4, 5, 6, 2, 1, 0 }` The values before the first offset and after the last offset should be left unchanged. The gtests have been corrected to expect this behavior. Also, the `SegmentedReductionTestUntyped.PartialSegmentReduction` gtest was improved to include offset gaps at the beginning and at the end to verify consistent behavior there as well. Found while working on #11729 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - MithunR (https://github.com/mythrocks) - Mark Harris (https://github.com/harrism) URL: #11888
Configuration menu - View commit details
-
Copy full SHA for 678946b - Browse repository at this point
Copy the full SHA 678946bView commit details -
Fix an issue reading struct-of-list types in Parquet. (#11910)
Fixes NVIDIA/spark-rapids#6718 There was a bug introduced recently #11752 where an insufficient check for whether an input column contained repetition information could cause incorrect results for column hierarchies with structs at the root. Authors: - https://github.com/nvdbaranec Approvers: - Jim Brennan (https://github.com/jbrennan333) - Nghia Truong (https://github.com/ttnghia) - Mike Wilson (https://github.com/hyperbolic2346) URL: #11910
Configuration menu - View commit details
-
Copy full SHA for fb0922f - Browse repository at this point
Copy the full SHA fb0922fView commit details -
Fixes Unsupported column type error due to empty list columns in Nest…
…ed JSON reader (#11897) Fixes `Unsupported column type` error during cudf column creation in Nested JSON reader due to empty list column. During json tree creation, Empty list column does not have `device_json_column` child because it does have any rows, or a type. This PR fixes the issue by creating an empty column as element child column. The list column still retains the null, and empty list information. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Vyas Ramasubramani (https://github.com/vyasr) URL: #11897
Configuration menu - View commit details
-
Copy full SHA for 662f309 - Browse repository at this point
Copy the full SHA 662f309View commit details -
Add clear indication of non-GPU accelerated parameters in read_json d…
…ocstring (#11825) This PR moves the "pandas engine only" arguments to the end of the optional argument list of the docstring. This is the way an `admonition` will look like: <img width="592" alt="Screen Shot 2022-10-11 at 12 06 50 PM" src="https://user-images.githubusercontent.com/11664259/195161106-71a1ec40-7e1b-4297-b6d9-67ff3a5aacc7.png"> Authors: - Gregory Kimball (https://github.com/GregoryKimball) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Lawrence Mitchell (https://github.com/wence-) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #11825
Gregory Kimball authoredOct 13, 2022 Configuration menu - View commit details
-
Copy full SHA for c824fee - Browse repository at this point
Copy the full SHA c824feeView commit details
Commits on Oct 14, 2022
-
Reduce memory usage in nested JSON parser - tree generation (#11864)
Reduces Memory usage by 53% in nested JSON parser tree generation algorithm. 1GB JSON takes 8.469 GiB instead of 16.957 GiB. All values below are for 1 GB JSON text input. This PR employs following optimisations to reduce memory usage - Modified to generate parent node ids from nodes instead of tokens. (16.957 GB -> 10.957 GiB) - Reordered node_range, node_categories generation to the end. (10.957 GiB -> 9.774 GiB) - Scope limited token_levels (9.774 GiB -> 9.403 GiB) - Used CUB sort instead of `thrust::stable_sort_by_key` (9.403 GiB -> 8.487 GiB) - Used `cub::DoubleBuffer` which eliminates copy of order. (8.487 GiB -> 7.97 GiB) The peak memory is reduced by 53%, parsing bandwidth still remains same. (1.6 GB/s in GV100 for 1GB JSON). Since `get_stack_context` in JSON parser takes highest memory usage (8.469 GB), peak memory is not influenced by JSON tree generation step anymore. Peak memory is now 50% of that of earlier code. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Tobias Ribizel (https://github.com/upsj) - Bradley Dice (https://github.com/bdice) URL: #11864
Configuration menu - View commit details
-
Copy full SHA for e91d7d9 - Browse repository at this point
Copy the full SHA e91d7d9View commit details -
Fix local offset handling in bgzip reader (#11918)
We accidentally checked the local offset against the compressed, not the uncompressed size. The new test failed prior to fixing the behavior. Authors: - Tobias Ribizel (https://github.com/upsj) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: #11918
Configuration menu - View commit details
-
Copy full SHA for 8a31e26 - Browse repository at this point
Copy the full SHA 8a31e26View commit details -
Add libcudf strings examples (#11849)
Creates example for calling libcudf APIs for strings processing. This also includes examples of building custom kernels for modifying libcudf strings columns. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Robert Maynard (https://github.com/robertmaynard) - Mark Sadang (https://github.com/msadang) - https://github.com/nvdbaranec URL: #11849
Configuration menu - View commit details
-
Copy full SHA for 7598253 - Browse repository at this point
Copy the full SHA 7598253View commit details -
Fix cudf::stable_sorted_order for NaN and -NaN in FLOAT64 columns (#1…
…1874) Fixes bug in `cudf::stable_sorted_order` when `-NaN` and `NaN` are in a FLOAT64 (double) columns. The code was fixed by refactoring common code with `cudf::sorted_order`. This uses thrust sort functions to help align the behavior and keep results consistent. New gtests were added to check for this case. Some test files were also updated per issue #11734 The new tests are at the bottom of `sort_test.cpp` and `stable_sort_tests.cpp` This was found while working on #11729 The sorted-order functions are reused for many of the libcudf sort functions so this will help with the work in #11729 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #11874
Configuration menu - View commit details
-
Copy full SHA for c265c58 - Browse repository at this point
Copy the full SHA c265c58View commit details
Commits on Oct 15, 2022
-
Handle
multibyte_split
byte_range out-of-bounds offsets on host (#1……1885) In order to uniformize the interface for a future combined handling of byte ranges between read_csv and read_text, this PR replaces the `cutoff_offset` by a plain integer again, and handles finding the first out-of-bounds on the host side instead. Authors: - Tobias Ribizel (https://github.com/upsj) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Bradley Dice (https://github.com/bdice) URL: #11885
Configuration menu - View commit details
-
Copy full SHA for 9f8b936 - Browse repository at this point
Copy the full SHA 9f8b936View commit details
Commits on Oct 17, 2022
-
Add
nanosecond
µsecond
toDatetimeProperties
(#11911)This PR: - [x] Implemented `extract_milli_second`, `extract_micro_second` and `extract_nano_second` in libcudf. - [x] Added `nanosecond` and `microsecond` in `DatetimeProperties` & `DatetimeIndex`. - [x] Updated docs - [x] Added & modified tests Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - David Wendt (https://github.com/davidwendt) - Matthew Roeschke (https://github.com/mroeschke) - Nghia Truong (https://github.com/ttnghia) - MithunR (https://github.com/mythrocks) - https://github.com/nvdbaranec - Bradley Dice (https://github.com/bdice) URL: #11911
Configuration menu - View commit details
-
Copy full SHA for edc058f - Browse repository at this point
Copy the full SHA edc058fView commit details
Commits on Oct 18, 2022
-
Fix documentation referring to removed as_gpu_matrix method. (#11937)
This fixes outdated documentation that refers to the `as_gpu_matrix` method, which was removed. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #11937
Configuration menu - View commit details
-
Copy full SHA for afa16b4 - Browse repository at this point
Copy the full SHA afa16b4View commit details -
Add
.str.find_multiple
API (#11928)Resolves: #10126 This PR adds `.str.find_multiple` API. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Bradley Dice (https://github.com/bdice) URL: #11928
Configuration menu - View commit details
-
Copy full SHA for a926c52 - Browse repository at this point
Copy the full SHA a926c52View commit details -
Pin mimesis version in setup.py. (#11906)
The dependency pinning for `mimesis` in cudf's `setup.py` didn't match the conda environment. It was missing a pinning to `<4.1` from #8745. However, based on the conversation in #8551, this pinning of `<4.1` was only chosen because 4.1.0 wasn't yet available on conda-forge. Since the current version of mimesis is now 6.1.1, this PR updates the mimesis pinning to `>=4.1` and uses `generate_string` instead of `schoice`. I tested this locally with mimesis 6.1.1 and mimesis 4.1.0 and both passed tests. Merge this PR concurrently with rapidsai/integration#547. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - AJ Schmidt (https://github.com/ajschmidt8) URL: #11906
Configuration menu - View commit details
-
Copy full SHA for cea10ca - Browse repository at this point
Copy the full SHA cea10caView commit details -
Removing int8 column option from parquet byte_array writing (#11539)
As suggested in #11526 and captured in issue #11536 the usage of both INT8 and UINT8 as supported types for byte_arrays is unnecessary and adds complexity to the code. This change removes INT8 as an option and only allows UINT8 columns to be written out as byte_arrays. ~~This matches with cudf string columns which contain an INT8 column for data.~~ closes #11536 Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - Tobias Ribizel (https://github.com/upsj) - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) - MithunR (https://github.com/mythrocks) - Bradley Dice (https://github.com/bdice) URL: #11539
Configuration menu - View commit details
-
Copy full SHA for 1effe19 - Browse repository at this point
Copy the full SHA 1effe19View commit details -
Initial draft of policies and guidelines for libcudf usage. (#11853)
This PR adds a section to the developer documentation about various libcudf design decisions that affect users. These policies are important for us to document and communicate consistently. I am not sure what the best place for this information is, but I think the developer docs are a good place to start since until we address #11481 we don't have a great way to publish any non-API user-facing libcudf documentation. I've created this draft PR to solicit feedback from other libcudf devs about other policies that we should be documenting in a similar manner. Once everyone is happy with the contents, I would suggest that we merge this into the dev docs for now and then revisit a better place once we've tackled #11481. Partly addresses #5505, #1781. Resolves #4511. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #11853
Configuration menu - View commit details
-
Copy full SHA for 5d57159 - Browse repository at this point
Copy the full SHA 5d57159View commit details -
Update flake8 to 5.0.4 and use flake8-force to check Cython. (#11736)
Resolves #11684, required for eventually supporting Python 3.10 (which requires flake8 >= 4.0.0). flake8 >= 4.0.0, however, does not support parsing Cython code, even with rule exclusions. This necessitates the flake8-force plugin, which was designed (by a cupy developer) for forcing flake8 to check Cython code with a limited set of rules. Per this comment (#11684 (comment)), this PR removes duplicate pinnings between pre-commit configuration and the developer conda environment. Developers should use pre-commit for style checks consistent with the CI environment. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Lawrence Mitchell (https://github.com/wence-) - AJ Schmidt (https://github.com/ajschmidt8) - GALI PREM SAGAR (https://github.com/galipremsagar) - Vyas Ramasubramani (https://github.com/vyasr) URL: #11736
Configuration menu - View commit details
-
Copy full SHA for 425fb02 - Browse repository at this point
Copy the full SHA 425fb02View commit details -
Adds retryCount to RmmEventHandler.onAllocFailure (#11940)
This adds the method `boolean onAllocFailure(long sizeRequested, int retryCount)` to `RmmEventHandler`, to help handling code keep track of the number of times an allocation failure has been retried. With this code callers can perform extra logic that depends on whether the callback was due to a brand new allocation failure, or one that has failed in the past and is being retried. This will be used here: NVIDIA/spark-rapids#6768 Authors: - Alessandro Bellina (https://github.com/abellina) Approvers: - Jason Lowe (https://github.com/jlowe) - Nghia Truong (https://github.com/ttnghia) URL: #11940
Configuration menu - View commit details
-
Copy full SHA for 6ca2ceb - Browse repository at this point
Copy the full SHA 6ca2cebView commit details
Commits on Oct 19, 2022
-
Refactor pad/zfill functions for reuse with strings udf (#11914)
Refactors the main device code used for `cudf::strings::pad` and `cudf::strings::zfill` for reuse in strings UDF pad and zfill functions. No new functions or features have been added, updated, or removed. The detail functions have been mainly just be moved to new file `cpp/include/cudf/strings/detail/pad_impl.cuh` Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Tobias Ribizel (https://github.com/upsj) URL: #11914
Configuration menu - View commit details
-
Copy full SHA for 08e4ec2 - Browse repository at this point
Copy the full SHA 08e4ec2View commit details -
Fix some gtests incorrectly coded in namespace cudf::test (part I) (#…
…11917) Fixes a few simple gtests that may not get touched in the course of other PRs. This removes the `using namespace cudf::test` or similar declaration from gtests where it is improperly used. No code logic has changed just variable declarations and function calls. Reference #11734 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Tobias Ribizel (https://github.com/upsj) - Vyas Ramasubramani (https://github.com/vyasr) URL: #11917
Configuration menu - View commit details
-
Copy full SHA for 08ffecc - Browse repository at this point
Copy the full SHA 08ffeccView commit details
Commits on Oct 20, 2022
-
Enable backend dispatching for Dask-DataFrame creation (#11920)
This PR depends on dask/dask#9475 (**Now Merged**) After dask#9475, external libraries are now able to implement (and expose) their own `DataFrameBackendEntrypoint` definitions to specify custom creation functions for DataFrame collections. This PR introduces the `CudfBackendEntrypoint` class to create `dask_cudf.DataFrame` collections using the `dask.dataframe` API. By installing `dask_cudf` with this entrypoint definition in place, you get the following behavior in `dask.dataframe`: ```python import dask.dataframe as dd import dask # Tell Dask that you want to create DataFrame collections # with the "cudf" backend (for supported creation functions). # This can also be used in a context, or set in a yaml file dask.config.set({"dataframe.backend": "cudf"}) ddf = dd.from_dict({"a": range(10)}, npartitions=2) type(ddf) # dask_cudf.core.DataFrame ``` Note that the code snippet above does not require an explicit import of `cudf` or `dask_cudf`. The following creation functions will support backend dispatching after dask#9475: - `from_dict` - `read_paquet` - `read_json` - `read_orc` - `read_csv` - `read_hdf` See also: dask/design-docs#1 Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #11920
Configuration menu - View commit details
-
Copy full SHA for 416d4d5 - Browse repository at this point
Copy the full SHA 416d4d5View commit details -
Remove validation that requires introspection (#11938)
This PR removes optional validation for some APIs. Performing these validations requires data introspection, which we do not want. This PR resolves #5505. Authors: - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) Approvers: - Mark Harris (https://github.com/harrism) - GALI PREM SAGAR (https://github.com/galipremsagar) - Matthew Roeschke (https://github.com/mroeschke) - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) - Jason Lowe (https://github.com/jlowe) URL: #11938
Configuration menu - View commit details
-
Copy full SHA for ff41841 - Browse repository at this point
Copy the full SHA ff41841View commit details -
Tell jitify_preprocess where to search for libnvrtc (#11787)
On machines with multiple CUDA Toolkits installed it is possible to have a mismatch between the version of `nvcc` used to compile code and the version of `libnvrtc` used for the JIT code. This generally occurs when `LD_LIBRARY_PATH` points to a different version of the CUDA Toolkit. We now explicitly specify what toolkit library directory to search when JIT code during libcudf compilation. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #11787
Configuration menu - View commit details
-
Copy full SHA for 536ddd0 - Browse repository at this point
Copy the full SHA 536ddd0View commit details -
Fix writing of Parquet files with many fragments (#11869)
This PR fixes an error that can occur when very small page sizes are used when writing Parquet files. #11551 changed from fixed 5000 row page fragments to a scaled value based on the requested max page size. For small page sizes, the number of fragments to process can exceed 64k. The number of fragments is used as the `y` dimension when calling `gpuInitPageFragments`, and when it exceeds 64k the kernel fails to launch, ultimately leading to an invalid memory access. Authors: - Ed Seidl (https://github.com/etseidl) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: #11869
Configuration menu - View commit details
-
Copy full SHA for 98185fe - Browse repository at this point
Copy the full SHA 98185feView commit details -
Default to equal NaNs in make_collect_set_aggregation. (#11621)
Partially resolves #11329. This helps to align our default behaviors for null and NaN equality across APIs, specifically for `make_collect_set_aggregation` in this PR. All functions should default to treating null values as equal to one another and NaN values as equal to one another. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: #11621
Configuration menu - View commit details
-
Copy full SHA for ee9ffd0 - Browse repository at this point
Copy the full SHA ee9ffd0View commit details -
Rename libcudf++ to libcudf. (#11953)
For consistency across our documentation, this PR renames `libcudf++` to `libcudf`. Authors: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - David Wendt (https://github.com/davidwendt) - GALI PREM SAGAR (https://github.com/galipremsagar) - Vyas Ramasubramani (https://github.com/vyasr) - Nghia Truong (https://github.com/ttnghia) URL: #11953
Configuration menu - View commit details
-
Copy full SHA for 5803015 - Browse repository at this point
Copy the full SHA 5803015View commit details
Commits on Oct 21, 2022
-
Update Unit Testing in libcudf guidelines to code tests outside the c…
…udf::test namespace (#11959) Update text to include coding tests outside the `cudf` or the `cudf::test` namespace. Realized our test guidelines needed to be updated while working on #11734. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) - Nghia Truong (https://github.com/ttnghia) URL: #11959
Configuration menu - View commit details
-
Copy full SHA for b9ba9e3 - Browse repository at this point
Copy the full SHA b9ba9e3View commit details -
Add tests ensuring that cudf's default stream is always used (#11875)
This PR ensures that cudf's default stream is properly passed to all kernel launches so that nothing implicitly runs on the CUDA default stream. It adds a small library that is built during the tests and overloads CUDA functions to throw an exception when usage of the default stream is detected. It also fixes all remaining usage of anything other than cudf's default stream (I fixed most of the issues in previous PRs, but I found a few others when finalizing this one). Resolves #11929 Resolves #11942 ### Important notes for reviewers: - **The changeset is deceptively large.** The vast majority of the changes are just a global find-and-replace of `cudf::get_default_stream()` for `cudf::default_stream_value`, as well as a few smaller fixes such as missing `CUDF_TEST_PROGRAM_MAIN` in a couple of tests and usage of `rmm::cuda_stream_default`. The meaningful changes are: - The new default stream getter/setter in `default_stream.[hpp|cpp]` - The addition of `cpp/tests/utilities/identify_stream_usage` - The changes to the base testing fixture in `cpp/include/cudf_test/base_fixture.hpp` to inject the custom stream. - The changes to CI in `ci/gpu/build.sh` to build and use the new library. - This PR is a breaking change because it moves the default stream into the detail namespace. Going forward the default stream may only be accessed using the public accessor `cudf::get_default_stream()`. I have added a corresponding setter, but it is also in the detail namespace since I do not want to publicly support changing the default stream yet, only for the purpose of testing. Reviewers, please leave comments if you disagree with those choices. - I have made getting and setting the default stream thread-safe, but there is still only a single stream. In multi-threaded applications we may want to support a stream per thread so that users could manually achieve PTDS with more fine-tuned control. Is this worthwhile? Even if it is, I'm inclined to wait for a subsequent PR to implement this unless someone feels strongly otherwise. - I'm currently only overloading `cudaLaunchKernel`. I can add overloads for other functions as well, but I didn't want to go through the effort of overloading every possible API. If reviewers have a minimal set that they'd like to see overloaded, let me know. [I've included links to all the relevant pages of the CUDA runtime API in the identify_stream_usage.cu file](https://github.com/rapidsai/cudf/pull/11875/files#diff-0b2762207c27c080acd2114475c7a1c06377a7c18c4e9c3de60ecbdc82a4dc61R99) if someone wants to look through them. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Jason Lowe (https://github.com/jlowe) - Bradley Dice (https://github.com/bdice) - Sevag H (https://github.com/sevagh) - https://github.com/brandon-b-miller - Jake Hemstad (https://github.com/jrhemstad) - David Wendt (https://github.com/davidwendt) URL: #11875
Configuration menu - View commit details
-
Copy full SHA for dec8bde - Browse repository at this point
Copy the full SHA dec8bdeView commit details -
Accept const refs instead of const unique_ptr refs in reduce and scan…
… APIs. (#11960) There is almost never a good reason to pass arguments as `unique_ptr<T> const&`. Since those arguments cannot be modified, the only use case is accessing the underlying pointer, at which point the function better communicates its intent by accepting the underlying pointer/reference as an argument instead and is also more flexible as a result. Resolves #10393 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #11960
Configuration menu - View commit details
-
Copy full SHA for 9c06330 - Browse repository at this point
Copy the full SHA 9c06330View commit details -
Fix maximum page size estimate in Parquet writer (#11962)
Closes #11916 cuda memcheck reports an OOB write in one of the tests. The root cause is an underallocated buffer for encoded pages. This PR fixes the computation of the maximum size of data pages (RLE encoded) when dictionary encoding is used. Other changes: Refactored max RLE page size computation to avoid code repetition. Use actual dictionary index width instead of (outdated) worst case. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) URL: #11962
Configuration menu - View commit details
-
Copy full SHA for 7940b5b - Browse repository at this point
Copy the full SHA 7940b5bView commit details -
add V2 page header support to parquet reader (#11778)
Adds support for reading parquet files with V2 page headers. Fixes #11686 ~~Submitting as draft for now because I'm not sure how to do unit tests for this. libcudf cannot produce files with V2 headers, so I would need to either add files to a data directory somewhere, or add raw binary of some parquet files to parquet_test.cpp. Given the comment on the `DecimalRead` test, neither seems attractive. Suggestions are welcome. Perhaps use python to test?~~ Authors: - Ed Seidl (https://github.com/etseidl) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Mike Wilson (https://github.com/hyperbolic2346) - Matthew Roeschke (https://github.com/mroeschke) URL: #11778
Configuration menu - View commit details
-
Copy full SHA for f1ab5e9 - Browse repository at this point
Copy the full SHA f1ab5e9View commit details -
Default to equal NaNs in make_merge_sets_aggregation. (#11952)
Partially resolves #11329. This helps to align our default behaviors for null and NaN equality across APIs, specifically for `make_merge_sets_aggregation` in this PR. All functions should default to treating null values as equal to one another and NaN values as equal to one another. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) URL: #11952
Configuration menu - View commit details
-
Copy full SHA for 5c2150e - Browse repository at this point
Copy the full SHA 5c2150eView commit details
Commits on Oct 24, 2022
-
Switch over to rapids-cmake patches for thrust (#11921)
Now that rapids-cmake supports custom patches we can move cudf over to rapids-cmake for Thrust. This removes the need for custom install rules in cudf for Thrust, as rapids-cmake does that for us. This also separates out all Thrust patches so that we can better track upstream approval and remove as needed. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #11921
Configuration menu - View commit details
-
Copy full SHA for 5a190b9 - Browse repository at this point
Copy the full SHA 5a190b9View commit details -
Fix lists and structs gtests coded in namespace cudf::test (#11956)
Fixes structs and lists gtests source files coded in namespace `cudf::test` These are the only 2 problem files for this in `cpp/tests/structs` and `cpp/tests/lists` and so will make those two directories complete. No function or test has changed just the source code reworked per namespaces. Reference #11734 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: #11956
Configuration menu - View commit details
-
Copy full SHA for 4c0f2fd - Browse repository at this point
Copy the full SHA 4c0f2fdView commit details -
Use gather-based strings factory in cudf::strings::strip (#11954)
Simplifies the `cudf::strings::strip` function to use the `cudf::make_strings_column` that accepts an iterator of pairs. This factory has a highly tuned gather implementation for building a strings column from an vector (iterator) of strings in device memory. This was inspired by the review and work in #11946. This also gives a small improvement in the performance of small columns of large strings and even more improvement in large columns of large-ish strings for strip. No function has changed just the internal implementation has been simplified. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Tobias Ribizel (https://github.com/upsj) URL: #11954
Configuration menu - View commit details
-
Copy full SHA for c806b10 - Browse repository at this point
Copy the full SHA c806b10View commit details -
Add gpu memory watermark apis to JNI (#11950)
This PR addresses #11949. We are adding methods to get the current memory usage watermarks at the whole process level and adding a "scoped" maximum, where the user can reset the initial value, run cuDF functions, and then call the API to get what happened since the reset. For the scoped maximum, the `getScopedMaximumOutstanding` could have somewhat surprising results. If the scoped maximum is reset to 0 for example, and we only see frees for allocations done before the reset, we are going to see that the scoped maximum returned is 0. This is because our memory usage is literally negative in this scenario. The APIs here assume that the caller process is using a single thread to call into the GPU (for Spark it would be 1 concurrent task). Note I assume `Rmm.initialize` has been called, otherwise this doesn't track allocations done before that. Authors: - Alessandro Bellina (https://github.com/abellina) Approvers: - Jim Brennan (https://github.com/jbrennan333) - Jason Lowe (https://github.com/jlowe) URL: #11950
Configuration menu - View commit details
-
Copy full SHA for 1e93af8 - Browse repository at this point
Copy the full SHA 1e93af8View commit details -
Add dtype docs pages and docstrings for
cudf
specific dtypes (#11974)Resolves #11605 This PR: - [x] Creates docs page entries for `cudf.CategoricalDtype`, `cudf.ListDtype`, `cudf.StructDtype`, `cudf.Decimal32Dtype`, `cudf.Decimal64Dtype`, `cudf.Decimal128Dtype`. - [x] Updates docstrings in all of the public APIs of the above dtypes. - [x] Links them in the `data-types.md` page where all supported dtypes are listed as a table. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #11974
Configuration menu - View commit details
-
Copy full SHA for 11918ae - Browse repository at this point
Copy the full SHA 11918aeView commit details
Commits on Oct 25, 2022
-
Replace most of preprocessor usage in nvcomp adapter with
constexpr
(……#11980) C++17's "constexpr if" provides the same functionality as `#if` directive, as used in the nvcomp adapter. This PR replaces macros with `constexpr` variables and uses them as conditions in "constexpr if" statements. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Nghia Truong (https://github.com/ttnghia) - Tobias Ribizel (https://github.com/upsj) URL: #11980
Configuration menu - View commit details
-
Copy full SHA for 2ee41d0 - Browse repository at this point
Copy the full SHA 2ee41d0View commit details -
Add pool memory resource to libcudf basic example (#11966)
Adds the pool memory resource to the libcudf basic example. Also adds README.md to the strings example and makes some minor fixes to the documentation. Closes #11870 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Elias Stehle (https://github.com/elstehle) URL: #11966
Configuration menu - View commit details
-
Copy full SHA for dc5924c - Browse repository at this point
Copy the full SHA dc5924cView commit details -
Add missing noexcepts to column_in_metadata methods (#11973)
These functions cannot throw exceptions. Resolved #11399 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #11973
Configuration menu - View commit details
-
Copy full SHA for 2d89f43 - Browse repository at this point
Copy the full SHA 2d89f43View commit details -
Replace default_stream_value with get_default_stream in docs. (#11985)
Brings the docs in line with the new way of getting the default stream in libcudf. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Tobias Ribizel (https://github.com/upsj) - Nghia Truong (https://github.com/ttnghia)
Configuration menu - View commit details
-
Copy full SHA for 285cb9e - Browse repository at this point
Copy the full SHA 285cb9eView commit details -
Ensure better compiler cache results between cudf cal-ver branches (#…
…11835) By passing the CUDF_VERSION compile definition only to the single source that needs it, we can remove compiler cache misses when switching between branches with different cal-ver values. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: #11835
Configuration menu - View commit details
-
Copy full SHA for a37f27b - Browse repository at this point
Copy the full SHA a37f27bView commit details -
This PR removes the stale issue labeler workflow Authors: - Ray Douglass (https://github.com/raydouglass) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #11995
Configuration menu - View commit details
-
Copy full SHA for ffd130a - Browse repository at this point
Copy the full SHA ffd130aView commit details -
Minor cleanup of root CMakeLists.txt for better organization (#11988)
Cleanup some minor issues in the root cudf CMakeLists.txt. Make the seaching for `CUDA_SANITIZER` only occur when we are building tests as that doesn't need to be done for production builds. Move the gdb pretty print script logic to a separate region to better document what it is for. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Bradley Dice (https://github.com/bdice) URL: #11988
Configuration menu - View commit details
-
Copy full SHA for 6a5c77b - Browse repository at this point
Copy the full SHA 6a5c77bView commit details -
Move protobuf compilation to CMake (#11986)
We currently compile a proto file into a Python file inside setup.py by overriding a certain setuptool (scikit-build) stage (`build_ext`). However, depending on the exact means by which we are building the package (specifically, in the case of building wheels) we may occasionally bypass that stage. Putting this logic into the CMake guarantees that it is always run. Authors: - Vyas Ramasubramani (https://github.com/vyasr) - Paul Taylor (https://github.com/trxcllnt) Approvers: - Bradley Dice (https://github.com/bdice) - Lawrence Mitchell (https://github.com/wence-) URL: #11986
Configuration menu - View commit details
-
Copy full SHA for 5bfc9a4 - Browse repository at this point
Copy the full SHA 5bfc9a4View commit details -
Use rapids-cmake for google benchmark. (#11997)
This PR centralizes handling of google benchmark during the build process by requesting it from rapids-cmake. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #11997
Configuration menu - View commit details
-
Copy full SHA for 6b9c026 - Browse repository at this point
Copy the full SHA 6b9c026View commit details -
Switch to DISABLE_DEPRECATION_WARNINGS to match other RAPIDS projects (…
…#11989) Use the term `DISABLE_DEPRECATION_WARNINGS` so that we match other RAPIDS projects rapidsai/cuml#4946 plus the plural tense in general makes more sense. Authors: - Robert Maynard (https://github.com/robertmaynard) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #11989
Configuration menu - View commit details
-
Copy full SHA for b7d0115 - Browse repository at this point
Copy the full SHA b7d0115View commit details
Commits on Oct 26, 2022
-
Add inplace arithmetic operators to
MaskedType
(#11987)Closes #11887 After merging, we will support syntax like `a += b` inside UDFs used through `DataFrame.apply` and `Series.apply`. Authors: - https://github.com/brandon-b-miller Approvers: - Bradley Dice (https://github.com/bdice) URL: #11987
Configuration menu - View commit details
-
Copy full SHA for b89c0e2 - Browse repository at this point
Copy the full SHA b89c0e2View commit details -
Revert "Replace most of preprocessor usage in nvcomp adapter with `co…
…nstexpr`" (#11999) Reverts #11980 The PR was made under the assumption that `if constexpr` branches can contain invalid code, if the branch is not taken. However, this only holds for templates. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #11999
Configuration menu - View commit details
-
Copy full SHA for c146d21 - Browse repository at this point
Copy the full SHA c146d21View commit details -
Fix some libcudf calls to cudf::detail::gather (#11963)
Fixes a couple source files that were calling gather by type-dispatching directly to the internal `column_gatherer` functor instead of using the `cudf::detail::gather` function(s). This simplifies the code and improves maintenance. For example, extra code to resolve the null-mask is eliminated since the appropriate `cudf::detail::gather` call does this automatically. No function has changed, just code cleanup. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Mark Harris (https://github.com/harrism) - Nghia Truong (https://github.com/ttnghia) - Karthikeyan (https://github.com/karthikeyann) URL: #11963
Configuration menu - View commit details
-
Copy full SHA for fac35b4 - Browse repository at this point
Copy the full SHA fac35b4View commit details -
Determine if Arrow has S3 support at runtime in unit test. (#11560)
Resolves #11559. This PR improves the logic for testing S3 support. Previously this test relied on the value of `CUDF_ENABLE_ARROW_S3`, which only enables S3 support in Arrow if Arrow is being built from source by libcudf. If the Arrow package is found locally (rather than fetched and built), the value of `CUDF_ENABLE_ARROW_S3` was irrelevant. Therefore, the tests using the compile-time value of `CUDF_ENABLE_ARROW_S3` were unable to correctly detect Arrow's S3 support. This PR fixes the problem by checking Arrow S3 support at runtime. I tested this locally for the case where Arrow doesn't have S3 support (our CI uses prebuilt Arrow packages with S3 enabled). Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Nghia Truong (https://github.com/ttnghia) - Tobias Ribizel (https://github.com/upsj) URL: #11560
Configuration menu - View commit details
-
Copy full SHA for 72572a8 - Browse repository at this point
Copy the full SHA 72572a8View commit details -
Feature/remove default streams (#11967)
Default stream parameters can lead to subtle bugs that are hard to track down if public APIs start exposing streams. Removing the defaults ensures that streams are properly forwarded through everywhere that they should be. This PR partially addresses #9854. It does not change the cases where removing the default value from a stream parameter would necessitate changing the order of parameters in the function signature due to the presence of other default parameters. That work will be done in a follow-up PR. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Tobias Ribizel (https://github.com/upsj) - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #11967
Configuration menu - View commit details
-
Copy full SHA for 07eb723 - Browse repository at this point
Copy the full SHA 07eb723View commit details -
Fix doxygen text for cudf::dictionary::encode (#11991)
Fixes the example code in the doxygen comment for `cudf::dictionary::encode` to use the correct API name. No function has change -- just code comments that generate public doxygen content. https://docs.rapids.ai/api/libcudf/stable/group__dictionary__encode.html#ga06997026d694784d613f4590563a8b33 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Bradley Dice (https://github.com/bdice) - Mike Wilson (https://github.com/hyperbolic2346) URL: #11991
Configuration menu - View commit details
-
Copy full SHA for 646a7e3 - Browse repository at this point
Copy the full SHA 646a7e3View commit details
Commits on Oct 27, 2022
-
Remove unnecessary code from dask-cudf _Frame (#12001)
Removes unnecessary code from `dask_cudf.core._Frame` that is already handled in the super-class (`dask.dataframe.core._Frame`). By removing the unnecessary `__init__` logic from `dask_cudf`, we can avoid breakages from upstream changes like dask/dask#9473. Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12001
Configuration menu - View commit details
-
Copy full SHA for cd21ce7 - Browse repository at this point
Copy the full SHA cd21ce7View commit details -
Ignore python docs build artifacts (#12000)
This PR gitignores some of the python docs build artifcats that keep showing up in `git status` Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Bradley Dice (https://github.com/bdice) URL: #12000
Configuration menu - View commit details
-
Copy full SHA for 8d49db5 - Browse repository at this point
Copy the full SHA 8d49db5View commit details -
Add
strip_delimiters
option toread_text
(#11946)This adds a `strip_delimiters` post-processing option to `read_text`. I needed to implement some lightweight striping because a thread-per-row parallelization of the string gather gave pretty bad performance. For consistency, I also removed the special-case handling of delimiters at the end (previously adding an empty row), to match the read_csv behavior. Benchmark results: ``` benchmarks/MULTIBYTE_SPLIT_NVBENCH --axis size_approx[pow2]=30 --axis byte_range_percent=100 --axis T=device --axis delim_size=4 ``` ### [0] Tesla T4 | T | strip_delimiters | delim_percent | size_approx | CPU Time | Noise | Peak Memory Usage | Encoded file size | |--------|------------------|---------------|-------------------|------------|-------|-------------------|-------------------| | device | 0 | 1 | 2^30 = 1073741824 | 178.133 ms | 0.36% | 3.709 GiB | 1014.442 MiB | | device | 1 | 1 | 2^30 = 1073741824 | 188.328 ms | 0.31% | 4.690 GiB | 1014.442 MiB | | device | 0 | 25 | 2^30 = 1073741824 | 206.188 ms | 0.03% | 5.292 GiB | 953.075 MiB | | device | 1 | 25 | 2^30 = 1073741824 | 242.534 ms | 0.50% | 5.975 GiB | 953.075 MiB | Closes #11625 Authors: - Tobias Ribizel (https://github.com/upsj) Approvers: - David Wendt (https://github.com/davidwendt) - GALI PREM SAGAR (https://github.com/galipremsagar) - Bradley Dice (https://github.com/bdice) URL: #11946
Configuration menu - View commit details
-
Copy full SHA for b4ca894 - Browse repository at this point
Copy the full SHA b4ca894View commit details -
Refactor multibyte_split
output_builder
(#11945)This PR moves the `output_builder` and `split_device_span` classes out of `multibyte_split` and adds an iterator for the `split_device_span`, enabling it to be used directly in Thrust algorithms. I also included a fix from #11875 to make the integration easier once that is merged. Authors: - Tobias Ribizel (https://github.com/upsj) Approvers: - Bradley Dice (https://github.com/bdice) - Mike Wilson (https://github.com/hyperbolic2346) URL: #11945
Configuration menu - View commit details
-
Copy full SHA for 43eb7a0 - Browse repository at this point
Copy the full SHA 43eb7a0View commit details -
Add pivot_table and crosstab to docs. (#12014)
This PR resolves #12012 by adding `cudf.pivot_table` and `cudf.crosstab` to the documentation. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: #12014
Configuration menu - View commit details
-
Copy full SHA for bac2004 - Browse repository at this point
Copy the full SHA bac2004View commit details -
Provide
data_chunk_source
wrapper fordatasource
(#11886)With `datasource` being more generic in its interface than `data_chunk_source`, this PR adds a wrapper that wraps a `datasource` in a `data_chunk_source` for use in `multibyte_split`. Its host read implementation is based on the file `data_chunk_source` Authors: - Tobias Ribizel (https://github.com/upsj) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Karthikeyan (https://github.com/karthikeyann) URL: #11886
Configuration menu - View commit details
-
Copy full SHA for 1b1ca7c - Browse repository at this point
Copy the full SHA 1b1ca7cView commit details -
Fix bug where
df.loc
resulting in single row could give wrong index (……#11998) Fixes #11930 I can't figure out the purpose of these lines, so let's try removing them and run CI. I haven't followed git blame back far enough to know the full story of these lines, but they originate at least three years ago: https://github.com/rapidsai/cudf/pull/2208/files#diff-5f58cf9dfe537ce53c6481f690ba66ff10807da04ad82df1c79c6d112d19c08b Authors: - Erik Welch (https://github.com/eriknw) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Lawrence Mitchell (https://github.com/wence-) URL: #11998
Configuration menu - View commit details
-
Copy full SHA for f17ea94 - Browse repository at this point
Copy the full SHA f17ea94View commit details -
Remove unused
managed_allocator
(#12005)The `managed_allocator` class is not used anywhere. All uses of cuco maps or the `concurrent_unordered_map` just use the `default_allocator`. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: #12005
Configuration menu - View commit details
-
Copy full SHA for 69fac8a - Browse repository at this point
Copy the full SHA 69fac8aView commit details
Commits on Oct 28, 2022
-
Add DataFrame.pivot_table. (#12015)
This PR adds the method `DataFrame.pivot_table` to enhance pandas API compatibility. It uses the exact same arguments as `cudf.pivot_table` but automatically supplies the first argument (a DataFrame). Related: #11314 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Ashwin Srinath (https://github.com/shwina) URL: #12015
Configuration menu - View commit details
-
Copy full SHA for 1017045 - Browse repository at this point
Copy the full SHA 1017045View commit details -
New GHA to add issues/prs to project board (#12016)
This PR adds a small GitHub action to automatically add new issues and PRs to the cudf GitHub project. It does not impact existing issues/PRs. Authors: - Ben Jarmak (https://github.com/jarmak-nv) Approvers: - Ashwin Srinath (https://github.com/shwina) - GALI PREM SAGAR (https://github.com/galipremsagar) - Jordan Jacobelli (https://github.com/Ethyling) URL: #12016
Configuration menu - View commit details
-
Copy full SHA for ee53458 - Browse repository at this point
Copy the full SHA ee53458View commit details -
Add deprecation warning for set_allocator. (#11958)
Resolves #11097. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: #11958
Configuration menu - View commit details
-
Copy full SHA for c915523 - Browse repository at this point
Copy the full SHA c915523View commit details -
Performance improvement in JSON Tree traversal (#11919)
This PR improves performance of JSON Tree traversal - mainly in creation of column id. - Replaced per-level processing with two-level hash algorithm - Reduced memory usage for hash map (reduced oversubscription) Other changes are - Fail if tokens has error token in tree generation - Created device_span version of device_parse_nested_json Hits 2 GB/s in GV100 from 128MB json. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Tobias Ribizel (https://github.com/upsj) - Nghia Truong (https://github.com/ttnghia) URL: #11919
Configuration menu - View commit details
-
Copy full SHA for aaf251d - Browse repository at this point
Copy the full SHA aaf251dView commit details -
Add method argument to DataFrame.quantile (#11957)
Adds a `method` argument to `Dataframe.quantile` to match pandas behavior. Also deprecates `DataFrame.quantiles` (with a `FutureWarning` informing the user of the `method` argument). Closes #11572 Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) - Bradley Dice (https://github.com/bdice) Approvers: - Ashwin Srinath (https://github.com/shwina) - Bradley Dice (https://github.com/bdice) - Matthew Roeschke (https://github.com/mroeschke) - Lawrence Mitchell (https://github.com/wence-) URL: #11957
Configuration menu - View commit details
-
Copy full SHA for 7620fb1 - Browse repository at this point
Copy the full SHA 7620fb1View commit details -
Add cython-lint to pre-commit checks. (#12020)
Adds `cython-lint` (https://github.com/MarcoGorelli/cython-lint) to the list of pre-commit checks. It is most similar to flake8 but with support for Cython syntax -- the rule set it enforces is fairly short, it mostly helps identify unused imports in Cython files. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12020
Configuration menu - View commit details
-
Copy full SHA for 0603167 - Browse repository at this point
Copy the full SHA 0603167View commit details
Commits on Oct 31, 2022
-
This PR adapts a few files using header guards with `#ifndef… #define` to use `#pragma once` instead. This establishes a more consistent code style for the library. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) URL: #12019
Configuration menu - View commit details
-
Copy full SHA for 1c057bc - Browse repository at this point
Copy the full SHA 1c057bcView commit details -
Pass column names to
write_csv
instead oftable_metadata
pointer (#……11972) contributes to #6411 `write_csv` takes a pointer to `table_metadata` but only uses the column names. This PR changes the API to directly take column names. This also aligns with `read_csv`. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Matthew Roeschke (https://github.com/mroeschke) - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #11972
Configuration menu - View commit details
-
Copy full SHA for f0b4c4f - Browse repository at this point
Copy the full SHA f0b4c4fView commit details
Commits on Nov 1, 2022
-
Remove default parameters for cudf::dictionary::detail functions (#12006
) Removes default parameters from the `cudf::dictionary::detail` functions. Most of these were allowing for the default memory-resource which is unnecessary. One non-stream, non-mr parameter was defaulted but the default was never used. Reference #11967 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Tobias Ribizel (https://github.com/upsj) - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) URL: #12006
Configuration menu - View commit details
-
Copy full SHA for a5aaa52 - Browse repository at this point
Copy the full SHA a5aaa52View commit details -
Remove default parameters for nvtext::detail functions (#12007)
Removes default parameters from the `nvtext::detail` functions. Most of these were internal default parameters which were unnecessary. The nvtext detail functions are only used within nvtext APIs. Reference #11967 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #12007
Configuration menu - View commit details
-
Copy full SHA for 991c86b - Browse repository at this point
Copy the full SHA 991c86bView commit details -
Update cuda-python dependency to 11.7.1 (#12030)
This is a mirror PR of #11994 to unblock gpu-ci which is currently blocked. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) - Ray Douglass (https://github.com/raydouglass) - Ashwin Srinath (https://github.com/shwina) - Bradley Dice (https://github.com/bdice) - Jordan Jacobelli (https://github.com/Ethyling) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #12030
Configuration menu - View commit details
-
Copy full SHA for 7af461c - Browse repository at this point
Copy the full SHA 7af461cView commit details -
Reduce/Remove reliance on
**kwargs
and*args
inIO
readers & wr……iters (#12025) Resolves: #11780 This PR: - [x] Reduces reliance on `args` & `kwargs` for readers and writers when `cudf` engine is selected. However, these will have to stay for the purpose of other engines we support in few readers & writers such as `pandas` & `pyarrow` engines. - [x] Fixes some bugs where dead parameters were still being used. - [x] Fixes some bugs where parameters weren't being passed until the cython later in the first place. - [x] Updates docs related to newly exposed parameters. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) - Bradley Dice (https://github.com/bdice) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) URL: #12025
Configuration menu - View commit details
-
Copy full SHA for d236779 - Browse repository at this point
Copy the full SHA d236779View commit details -
Add
read_orc_metadata
to libcudf (#11815)Issue #11675 Adds a C++ interface to get information about an ORC file. It is meant to be an efficient way to get information like column names and types, as well as file structure (e.g. number of stripes). The returned structure can be expanded to include more types of metadata, for now it only returns info that we found relevant internally. The returned column hierarchy matches the one used in ORC (i.e. root struct column included), not the hierarchy of a cuDF dataframe that the file would be read as (root column children become top level cuDF columns). This PR also includes improvements to ORC reader benchmarks, enabled by the new metadata API. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - AJ Schmidt (https://github.com/ajschmidt8) - https://github.com/nvdbaranec URL: #11815
Configuration menu - View commit details
-
Copy full SHA for 41fca6e - Browse repository at this point
Copy the full SHA 41fca6eView commit details -
Leverage rapids_cython for more automated RPATH handling (#11996)
This PR leverages a new feature of rapids-cmake to avoid needing to manually set the RPATHs for all extension modules individually, instead just pointing to a directory once and then letting rapids-cmake automatically handle the rest. This approach is a lot less error-prone since developers do not need to keep track of the relative paths in each CMakeLists.txt file. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - Robert Maynard (https://github.com/robertmaynard) URL: #11996
Configuration menu - View commit details
-
Copy full SHA for 2fe06bc - Browse repository at this point
Copy the full SHA 2fe06bcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 80c238c - Browse repository at this point
Copy the full SHA 80c238cView commit details -
Remove smart quotes from all docstrings. (#12035)
This PR removes all "smart quotes" from the library by enforcing a pre-commit hook. Smart quotes typically arise from copying rendered docstrings from Pandas, because Sphinx automatically transforms straight quotes into smart quotes when rendering the docs as HTML. However, the use of smart quotes is undesirable in code, and makes it difficult to do find-replace transformations if straight and smart quotes are mixed. I have made suggestions to fix this several times before, so I am making the suggestions more permanent and automatically enforceable via a pre-commit style check: - #12025 (comment) - #9817 (comment) - #9571 (comment) Authors: - Bradley Dice (https://github.com/bdice) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12035
Configuration menu - View commit details
-
Copy full SHA for f19bdbc - Browse repository at this point
Copy the full SHA f19bdbcView commit details -
Configuration menu - View commit details
-
Copy full SHA for f3bf872 - Browse repository at this point
Copy the full SHA f3bf872View commit details -
Fix Parquet support for seconds and milliseconds duration types (#11854)
Fixes #11833 Parquet writer used int64 for `second` and `millisecond` durations. This does not match the Parquet spec, which requires int32 to be used here. Changed the physical type of time_millis to int32 to match specs. Set logical type for time(duration) types. Using the logical types allows us to write nanosecond durations as nanoseconds, so no precision loss any more. Parquet writer option `timestamp_type` does not apply to durations any more. Authors: - Vukasin Milovanovic (https://github.com/vuule) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) URL: #11854
Configuration menu - View commit details
-
Copy full SHA for 1c2ad6a - Browse repository at this point
Copy the full SHA 1c2ad6aView commit details -
Merge pull request #12045 from vyasr/branch-22.12-merge-22.10
Forward merge 22.10 into 22.12
Configuration menu - View commit details
-
Copy full SHA for c04dbef - Browse repository at this point
Copy the full SHA c04dbefView commit details -
Port thrust's pinned_allocator to cudf, since Thrust 1.17 removes the…
… type (#12004) Thrust 1.17 removes the experimental/pinned_allocator. While Thrust offers a replacement in `thrust::system::cuda::universal_host_pinned_memory_resource`. In doing so we also need to move the consumers to being CUDA sources which would negatively impact our compile time. Instead we move Thrust's removed pinned_allocator into cudf as it allows usage from C++ sources and doesn't require larger changes to handle the fact the value_type from the container becomes `thrust::pointer<T>`. Note: We haven't seen a compile failure up to this point due to the fact that all CUDA 11.X toolkits provide a version of thrust that has the experimental header. So when it wasn't found in our 1.17.2 location the compiler would fallback to the one in the CTK. We can't rely on this behavior moving forward. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Bradley Dice (https://github.com/bdice) - Tobias Ribizel (https://github.com/upsj) - David Wendt (https://github.com/davidwendt) - Jake Hemstad (https://github.com/jrhemstad) - Mark Sadang (https://github.com/msadang) URL: #12004
Configuration menu - View commit details
-
Copy full SHA for ac3f205 - Browse repository at this point
Copy the full SHA ac3f205View commit details -
Standardize newlines at ends of files. (#12042)
This PR makes all files end with exactly one newline and enforces that rule with a pre-commit hook. The vast majority of files already comply with this rule, which improves consistency in the library's code style. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - GALI PREM SAGAR (https://github.com/galipremsagar) - Mark Sadang (https://github.com/msadang) - Matthew Roeschke (https://github.com/mroeschke) - Robert Maynard (https://github.com/robertmaynard) - Nghia Truong (https://github.com/ttnghia) URL: #12042
Configuration menu - View commit details
-
Copy full SHA for 03034af - Browse repository at this point
Copy the full SHA 03034afView commit details
Commits on Nov 2, 2022
-
Trim trailing whitespace from all files. (#12041)
This PR trims trailing whitespace from all files and adds a pre-commit hook to enforce that change for the future. The vast majority of files already comply with this rule, which improves consistency in the library's code style. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - GALI PREM SAGAR (https://github.com/galipremsagar) - Mark Sadang (https://github.com/msadang) - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) URL: #12041
Configuration menu - View commit details
-
Copy full SHA for a20bbfb - Browse repository at this point
Copy the full SHA a20bbfbView commit details -
Add strings udf C++ classes and functions for phase II (#11912)
Adds the C++ classes and functions for the phase II of strings udf. This specifically includes the device side string class which can be used for building udfs the create or modify strings. Also included are some basic helper functions for split, strip, case, and numeric conversion. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) - https://github.com/brandon-b-miller URL: #11912
Configuration menu - View commit details
-
Copy full SHA for 5ace809 - Browse repository at this point
Copy the full SHA 5ace809View commit details -
Rollback of
DeviceBufferLike
(#12009)This PR replaces `DeviceBufferLike` with `Buffer` and clear the way for a spillable sub-class of `Buffer`. #### Context The introduction of the [`DeviceBufferLike`](#11447) protocol was motivated by [the spilling work](#11553), which we initially thought would have to be implemented in Cython. However, it can be done in pure Python, which makes `DeviceBufferLike` an unneeded complexity. #### Review notes - In order to introduce a spillable-buffer in the future, we still use a factory function, `as_buffer()`, to create Buffers. - `buffer.py` is moved into the submodule `core.buffer` to ease organization when adding the spillable-buffer and spilling manager. #### Breaking This PR breaks external use of `Buffer` e.g. `Buffer.__init__` raise an exception now and the `"constructor-kwargs"` header from #4164 has been removed. Submitted a PR to fix this in cuml: rapidsai/cuml#4965 ## Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Ashwin Srinath (https://github.com/shwina) URL: #12009
Configuration menu - View commit details
-
Copy full SHA for d6a9e4a - Browse repository at this point
Copy the full SHA d6a9e4aView commit details -
Fixes bug in csv_reader_options construction in cython (#12021)
Fixes bug in csv_reader_options construction in cython The false values for csv were not passed to the csv_reader_options during construction in cython code. This is fixed and a unit test is added. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Lawrence Mitchell (https://github.com/wence-) URL: #12021
Configuration menu - View commit details
-
Copy full SHA for a3d2276 - Browse repository at this point
Copy the full SHA a3d2276View commit details -
Enable CEC for
strings_udf
(#11884)This PR removes the runtime checks for CEC in `strings_udf`. Authors: - https://github.com/brandon-b-miller Approvers: - Graham Markall (https://github.com/gmarkall) - Lawrence Mitchell (https://github.com/wence-) - Ray Douglass (https://github.com/raydouglass) URL: #11884
Configuration menu - View commit details
-
Copy full SHA for 49fc3c7 - Browse repository at this point
Copy the full SHA 49fc3c7View commit details -
Add full page indexes to Parquet writer benchmarks (#11955)
Adds `statistics_freq::STATISTICS_COLUMN` to list of parquet writer options to benchmark. This should have been included in #11302. Authors: - Ed Seidl (https://github.com/etseidl) Approvers: - Nghia Truong (https://github.com/ttnghia) - Karthikeyan (https://github.com/karthikeyann) URL: #11955
Configuration menu - View commit details
-
Copy full SHA for 856ac3f - Browse repository at this point
Copy the full SHA 856ac3fView commit details -
Make all
nvcc
warnings into errors (#8916)Seeing what impact [`-Werror=all-warnings`](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#generic-tool-options-Werror) has on device-side compilation. Device warnings now treated as errors: ``` cudf/cpp/src/io/orc/stripe_enc.cu (633): error: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function cudf/cpp/src/io/orc/writer_impl.cu ptxas error : Stack size for entry function '_ZN4cudf6detail20single_thread_kernelIZNS_2io6detail3orc19make_orc_table_viewERKNS_10table_viewERKNS_17table_device_viewEPKNS2_14table_metadataEN3rmm16cuda_stream_viewEEUlvE_EEvT_' cannot be statically determined cudf/cpp/src/binaryop/compiled/binary_ops.cu(46): error: parameter "mr" was declared but never referenced cudf/cpp/src/binaryop/compiled/binary_ops.cu(204): error: variable "out" was declared but never referenced ``` Authors: - Paul Taylor (https://github.com/trxcllnt) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - Jake Hemstad (https://github.com/jrhemstad) - Bradley Dice (https://github.com/bdice) - Robert Maynard (https://github.com/robertmaynard) URL: #8916
Configuration menu - View commit details
-
Copy full SHA for d949cd2 - Browse repository at this point
Copy the full SHA d949cd2View commit details
Commits on Nov 3, 2022
-
Add developer docs for writing tests (#11199)
This PR adds documentation on how Python tests should be written. Related to #4730. This PR will establish best practices. Follow-up PRs will be needed to implement them. Resolves #6481. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - Matthew Roeschke (https://github.com/mroeschke) - Lawrence Mitchell (https://github.com/wence-) - Ashwin Srinath (https://github.com/shwina) URL: #11199
Configuration menu - View commit details
-
Copy full SHA for eaa0706 - Browse repository at this point
Copy the full SHA eaa0706View commit details -
Trim quotes for non-string values in nested json parsing (#11898)
Trim quotes for non-string values in nested json parsing Added corner cases for unquoted and quoted literals. (Review the unit test) Fixes old json reader to treat `"null"` as string instead of NULL. closes #11817 Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #11898
Configuration menu - View commit details
-
Copy full SHA for e402448 - Browse repository at this point
Copy the full SHA e402448View commit details -
Add strings
like
jni and native method (#12032)[#11558](#11558) added strings `like` function to cudf, which is a wildcard-based string matching function based on SQL's LIKE statement. We add `like` jni and native method calling the `like` function in #11558 and corresponding Java unit tests. This is part of the solution for issue [NVIDIA/spark-rapids#6430](NVIDIA/spark-rapids#6430). Authors: - Yuan Jiang (https://github.com/cindyyuanjiang) Approvers: - Nghia Truong (https://github.com/ttnghia) - Gera Shegalov (https://github.com/gerashegalov) - Jason Lowe (https://github.com/jlowe) URL: #12032
Configuration menu - View commit details
-
Copy full SHA for baa645d - Browse repository at this point
Copy the full SHA baa645dView commit details -
Add
memory_usage
&items
implementation forStruct
column & dty……pe (#12033) Fixes: #11893 - [x] This PR implements `StructColumn.memory_usage` and `StructDtype.items` Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #12033
Configuration menu - View commit details
-
Copy full SHA for b156c25 - Browse repository at this point
Copy the full SHA b156c25View commit details
Commits on Nov 4, 2022
-
Force using old fmt in nvbench. (#12067)
This is a port of #12064 to 22.12 to unblock CI because forward mergers are currently disabled. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Robert Maynard (https://github.com/robertmaynard) URL: #12067
Configuration menu - View commit details
-
Copy full SHA for 2a58ff6 - Browse repository at this point
Copy the full SHA 2a58ff6View commit details -
Allow falling back to
shim_60.ptx
by default instrings_udf
(#12056)In the context of distributed, `strings_udf` needs to import and set itself up without creating a CUDA context, as this can interfere with up the way the network is being set up. In this situation it can't use it's normal mechanism (which requires a context) to query the compute capability of the device, and it falls back on an environment variable `STRINGS_UDF_CC` that it needs to be passed from dask instead. A user can set this and their code will work no problem, but we also need some default configuration that just works when someone builds their code. Without knowing their setup beforehand this can be problematic, as such I originally added the default value of `cc=52` when the environment variable isn't set. This was however not exactly correct for a few reasons: - It should be 60 I think since pascal is the oldest arch supported by rapids - we don't always build shim_60.ptx especially in local mode. This PR fixes this problem. Authors: - https://github.com/brandon-b-miller Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Vyas Ramasubramani (https://github.com/vyasr) URL: #12056
Configuration menu - View commit details
-
Copy full SHA for 1d6931a - Browse repository at this point
Copy the full SHA 1d6931aView commit details -
Remove default parameters for cudf::strings::detail functions (#12003)
Removes default parameters from the `cudf::strings::detail` functions. Most of these were unintentional the rest were for allowing for the default memory-resource which was easily fixed. Most of the detail functions are not used outside of strings and the default parameters were not actually necessary there. Hopefully this will help with #11967 Authors: - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: #12003
Configuration menu - View commit details
-
Copy full SHA for 0278485 - Browse repository at this point
Copy the full SHA 0278485View commit details -
Remove overflow error during decimal binops (#12063)
Fixes: #11337 - [x] This PR removes raising of an overflow error and rather let's the data overflow similar to what we do with other numeric dtypes. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #12063
Configuration menu - View commit details
-
Copy full SHA for b1c2520 - Browse repository at this point
Copy the full SHA b1c2520View commit details -
Fixes List offset bug in Nested JSON reader (#12060)
Fixes List offset end last item write condition bug If there is a list row followed by empty list in next row, the previous row's end is not written to offsets. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #12060
Configuration menu - View commit details
-
Copy full SHA for e788f36 - Browse repository at this point
Copy the full SHA e788f36View commit details -
Mark nvcomp zstd compression stable (#12059)
NVCOMP zstd compression was added in 22.10, but marked experimental, meaning you have to define the environment variable `LIBCUDF_NVCOMP_POLICY=ALWAYS` to enable it. After completing validation testing using the spark rapids plugin as documented here: NVIDIA/spark-rapids#3037, we believe that we can now change the zstd compression status to stable, which will enable it in cudf by default. `LIBCUDF_NVCOMP_POLICY=STABLE` is the default value. Authors: - Jim Brennan (https://github.com/jbrennan333) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) - Vukasin Milovanovic (https://github.com/vuule) URL: #12059
Configuration menu - View commit details
-
Copy full SHA for a3e9c1c - Browse repository at this point
Copy the full SHA a3e9c1cView commit details -
Add debug-only onAllocated/onDeallocated to RmmEventHandler (#12054)
This adds `onAllocated` and `onDeallocated` to `RmmEventHandler` as debug callbacks. If the event handler is installed with debug enabled (in `Rmm.setEventHandler`) these callbacks will be invoked when an allocation or deallocation finishes. It also fixes a bug with #11950 where the initial allocated amount was not getting set appropriately. It was getting set to 0, but instead it should be set to the new initial value/maximum. Closes #11949. Authors: - Alessandro Bellina (https://github.com/abellina) Approvers: - Jason Lowe (https://github.com/jlowe) URL: #12054
Configuration menu - View commit details
-
Copy full SHA for 6e13139 - Browse repository at this point
Copy the full SHA 6e13139View commit details -
Adding feature Truncate to DataFrame and Series (#11435)
This PR closes #9629 by adding truncate feature to DataFrame and Series. Truncates a DataFrame or Series before and after some index value. If the index being truncated contains only datetime values, before and after may be specified as strings instead of Timestamps. Authors: - https://github.com/VamsiTallam95 - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #11435
Configuration menu - View commit details
-
Copy full SHA for 9df2eba - Browse repository at this point
Copy the full SHA 9df2ebaView commit details -
Fix type casting in Series.__setitem__ (#11904)
To mimic pandas, we must upcast a column to the numpy result_type of the column itself and the input value dtype. This previously occurred in all relevant cases except when the index provided to __setitem__ was a single integer (originally introduced in #2442). Closes #11901. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Bradley Dice (https://github.com/bdice) URL: #11904
Configuration menu - View commit details
-
Copy full SHA for 11b875b - Browse repository at this point
Copy the full SHA 11b875bView commit details
Commits on Nov 7, 2022
-
Fix link to c++ developer guide from
CONTRIBUTING.md
(#12084)Noticed this link was broken when poking around, I think this should fix it. Authors: - https://github.com/brandon-b-miller Approvers: - David Wendt (https://github.com/davidwendt) URL: #12084
Configuration menu - View commit details
-
Copy full SHA for 52dbb63 - Browse repository at this point
Copy the full SHA 52dbb63View commit details -
Fix ingest_raw_data performance issue in Nested JSON reader due to RVO (
#12070) Issue is that `json::experimental::ingest_raw_data` took double the time of `json::ingest_raw_data` for same data. After replacing tertiary operator with `if` `else`, runtime for 500 MB file is same as `json::ingest_raw_data` I suspect, RVO (copy elision) is skipped while using tertiary operator. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Nghia Truong (https://github.com/ttnghia) - Elias Stehle (https://github.com/elstehle) - MithunR (https://github.com/mythrocks) URL: #12070
Configuration menu - View commit details
-
Copy full SHA for 262631b - Browse repository at this point
Copy the full SHA 262631bView commit details -
Add checks for HLG layers in dask-cudf groupby tests (#10853)
This PR adds helper function `check_groupby_result` to dask-cudf's groupby tests, and is used in the basic tests to ensure that we are using dask-cudf's `groupby_agg` function to compute the result as expected. I also expanded `test_groupby_agg` to test all supported aggregations, and removed tests that were made superfluous by this change. Authors: - Charles Blackmon-Luca (https://github.com/charlesbluca) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Lawrence Mitchell (https://github.com/wence-) URL: #10853
Configuration menu - View commit details
-
Copy full SHA for 17b6b2e - Browse repository at this point
Copy the full SHA 17b6b2eView commit details -
Fix quantile gtests coded in namespace cudf::test (#12049)
Fixes `cpp/tests/quantiles` gtests source files coded in namespace `cudf::test` The `tdigest_utilities.cu` was moved to `cpp/tests/utilities` since it is used by quantiles, groupby, reductions tests. Also, the header for the functions defined in this source file is in `cpp/include/cudf_tests/`. The `cpp/include/cudf_tests/tdigest_utilities.cuh` was also including a source file header from `cudf/tests/groupby` which seemed odd and was corrected by moving the code it needed directly into the `tdigest_utilities.cuh` header. These functions were used by quantiles, groupby, reductions, etc so it made sense for them to be moved into this utility header. Simple reworking some of the code in `percentile_approx_test.cu` allowed it to become a `.cpp` file as well. Also made some minor changes to the `tdigest_column_view` class to isolate a functor inside the class instead of the namespace scope. No function or test has changed just the source code reworked or moved around. Reference #11734 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) URL: #12049
Configuration menu - View commit details
-
Copy full SHA for f9a2512 - Browse repository at this point
Copy the full SHA f9a2512View commit details -
Throw an error when libcudf is built without cuFile and `LIBCUDF_CUFI…
…LE_POLICY` is set to `"ALWAYS"` (#12080) Currently, creating a cufile `datasource` or `data_sink` silently fails if libcudf was built without the cuFile headers. This is expected behavior when the `LIBCUDF_CUFILE_POLICY` is not set, or is set to and value other than "ALWAYS". However, with "ALWAYS", there should be no fallback from GDS. This PR adds a check to fail loudly when `LIBCUDF_CUFILE_POLICY=="ALWAYS"` cannot be enforced because of missing dependency (cuFile). Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Yunsong Wang (https://github.com/PointKernel) - David Wendt (https://github.com/davidwendt) - Mike Wilson (https://github.com/hyperbolic2346) URL: #12080
Configuration menu - View commit details
-
Copy full SHA for a72627a - Browse repository at this point
Copy the full SHA a72627aView commit details -
Move and update
dask
nigthly install in CI (#12082)This PR updates `dask` nightly install to correctly install the packages. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Ray Douglass (https://github.com/raydouglass) URL: #12082
Configuration menu - View commit details
-
Copy full SHA for ec46e7f - Browse repository at this point
Copy the full SHA ec46e7fView commit details -
Use nosync policy in gather and scatter implementations. (#12038)
This PR uses `rmm::exec_policy_nosync` in libcudf's gather and scatter functions. These changes are motivated by performance improvements seen previously in #11577. # Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - David Wendt (https://github.com/davidwendt) - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: #12038
Configuration menu - View commit details
-
Copy full SHA for 2ced214 - Browse repository at this point
Copy the full SHA 2ced214View commit details
Commits on Nov 8, 2022
-
Remove macros that inspect the contents of exceptions (#12076)
We should not be encouraging users to rely specific error messages. Anywhere that is currently doing so is likely an indication that libcudf should be throwing a more specific type of exception instead of just a `cudf::logic_error`. This PR removes the testing utilities that were previously used for this purpose and reworks the relevant tests. Related to #10200. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: #12076
Configuration menu - View commit details
-
Copy full SHA for b16b4ff - Browse repository at this point
Copy the full SHA b16b4ffView commit details -
Enable returning string data from UDFs used through
apply
(#11933)This PR introduces the ability to return a string from a UDF used through `DataFrame.apply` or `Series.apply`. It provides all of the plumbing needed to run the function `lambda st: return st`, but does not provide any APIs that return strings such as `strip` or `upper` - these will be added in a series of followups. A cast from `string_view` to `udf_string` is provided that numba will call when attempting to return a `string_view` into a `udf_string` array. Authors: - https://github.com/brandon-b-miller - David Wendt (https://github.com/davidwendt) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #11933
Configuration menu - View commit details
-
Copy full SHA for 35077f5 - Browse repository at this point
Copy the full SHA 35077f5View commit details -
Bifurcate Dependency Lists [skip-gpuci] (#11674)
This PR uses the [`rapids-dependency-file-generator`](https://github.com/rapidsai/dependency-file-generator/) to handle sourcing dependencies. Similar to rapidsai/rmm#1073, this PR introduces a GitHub Action that enforces consistency between the new `dependencies.yaml` file and the generated conda environment for developers. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - GALI PREM SAGAR (https://github.com/galipremsagar)
Configuration menu - View commit details
-
Copy full SHA for c900fed - Browse repository at this point
Copy the full SHA c900fedView commit details -
Enable building against the libarrow contained in pyarrow (#12034)
This feature is a prerequisite for wheels. There is no real good reason to do this except to provide interop with a pyarrow wheel, so this option is marked as advanced. In the process of implementing this feature, I have also done some cleanup of `get_arrow.cmake` to try and simplify its logic. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Paul Taylor (https://github.com/trxcllnt) - Robert Maynard (https://github.com/robertmaynard) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12034
Configuration menu - View commit details
-
Copy full SHA for 8ee5f51 - Browse repository at this point
Copy the full SHA 8ee5f51View commit details -
Remove CUDA 10 compatibility code. (#12088)
This PR updates some documentation and removes some compatibility layers referencing CUDA 10, which is no longer supported by the package. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - David Wendt (https://github.com/davidwendt) - Lawrence Mitchell (https://github.com/wence-) URL: #12088
Configuration menu - View commit details
-
Copy full SHA for 7535f31 - Browse repository at this point
Copy the full SHA 7535f31View commit details
Commits on Nov 9, 2022
-
Change cudf::detail::tdigest to cudf::tdigest::detail (#12050)
Changes `cudf::detail::tdigest` to `cudf::tdigest::detail` in the tdigest source files. While working on #12049, found there was a mixture of `cudf::tdigest` and `cudf::detail::tdigest` that seemed confusing and inconsistent. Changing to `cudf::tdigest::detail` made this code easier to follow. Also, move the `size_begin()` member function in `tdigest_column_view` out as a standalone function in a separate `.cuh` header since it is only used in a few places and the `tdigest_column_view.cuh` is included in many places. This allowed changing the `tdigest_column_view.cuh` to a `.hpp` file. Depends on #12049 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) - Ray Douglass (https://github.com/raydouglass) URL: #12050
Configuration menu - View commit details
-
Copy full SHA for 628cd4f - Browse repository at this point
Copy the full SHA 628cd4fView commit details -
Add regex_program class for use with all regex APIs (#11927)
Adds a new `regex_program` class to encapsulate a regex pattern and parameters used for executing regex calls on strings columns in libcudf. This provides a single object to hold the regex settings rather than adding or updating parameters to every call. Given a pattern (and other settings), it will _compile_ and validate the pattern and build the set of instructions/commands needed to execute the regex on a strings column. Converting the pattern is done in CPU code. The object contains no state data and can be reused on the same API or other similar calls as appropriate (per the settings). The object can also be queried to help with resource allocation/expectations. The main files to review are the new `regex_program*` source files plus the corresponding changes in `regexec.cpp` (renamed from .cu). The remainder are simply side-effects and have common patterns to use the new object. No function or behavior has changed but rather an new interface has been added over existing function but additional tests have been added to exercise through the companion APIs. Currently, all regex APIs are duplicated -- the original API plus a new one accepting a `regex_progam` object. Once accepted we may consider deprecating the non-object APIs and then removing them in a future release. This will help with changes needed for #10852 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) - Ray Douglass (https://github.com/raydouglass) URL: #11927
Configuration menu - View commit details
-
Copy full SHA for 74053f4 - Browse repository at this point
Copy the full SHA 74053f4View commit details -
Fix an error in IO with
GzipFile
type (#12085)Fixes: #10590 This PR fixes an issue where the file-like object doesn't have a `size` attribute, we will manually compute the size of the file. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - https://github.com/brandon-b-miller - Vukasin Milovanovic (https://github.com/vuule) URL: #12085
Configuration menu - View commit details
-
Copy full SHA for a2c428c - Browse repository at this point
Copy the full SHA a2c428cView commit details -
Update Numba docs links. (#12107)
This updates links that point to the Numba documentation to use the new domain. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12107
Configuration menu - View commit details
-
Copy full SHA for 26d449c - Browse repository at this point
Copy the full SHA 26d449cView commit details -
Add
truncate
API to python doc pages (#12109)In #11435, the `truncate` API was added but I had a review comment(to add it docs) that I forgot to publish. This PR adds `truncate` to the docs page. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Bradley Dice (https://github.com/bdice) URL: #12109
Configuration menu - View commit details
-
Copy full SHA for fbac4b4 - Browse repository at this point
Copy the full SHA fbac4b4View commit details -
Expose engine argument in dask_cudf.read_json (#12101)
Exposes the `engine` argument in `dask_cudf.read_json`, enabling `dask_cudf.read_json(... engine="cudf_experimental")` for nested json data. TODO (~maybe this PR?~): - [ ] (**EDIT**: This should be done in a separate PR) Add simple/optimized code path to leverage the `byte_range` parameter for local storage (similar to what is done in [`dask_cudf.read_csv`](https://github.com/rapidsai/cudf/blob/7535f31cfaf7e01578c413bb3ba46b03d2014806/python/dask_cudf/dask_cudf/io/csv.py#L72)). This would depend on #12017 for nested json data. Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Lawrence Mitchell (https://github.com/wence-) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12101
Configuration menu - View commit details
-
Copy full SHA for 6f78e74 - Browse repository at this point
Copy the full SHA 6f78e74View commit details -
Fix reading of CSV files with blank second row (#12098)
There are two options to get the names of columns in a CSV file - header or the first row. In case the first row is used, names are generated, and the only part of the row that is used is the number of detected columns. This PR fixes the corner case where a blank line after the first (non-header) row causes the reader to detect an additional column (and return an additional column of nulls). The fix is to break when there is a terminator character within the first row; this only happens with blank row(s) after the first data row. The reader already does this when reading column names from a header, this PR just removes this difference in behavior that was causing the bug. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: #12098
Configuration menu - View commit details
-
Copy full SHA for 4de279d - Browse repository at this point
Copy the full SHA 4de279dView commit details
Commits on Nov 10, 2022
-
Support
strip
,lstrip
, andrstrip
instrings_udf
(#12091)This PR adds support for the following three functions in `strings_udf`: - `str.strip(other)` - `str.lstrip(other)` - `str.rstrip(other)` Part of #9639 Authors: - https://github.com/brandon-b-miller - David Wendt (https://github.com/davidwendt) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #12091
Configuration menu - View commit details
-
Copy full SHA for 59bd5c3 - Browse repository at this point
Copy the full SHA 59bd5c3View commit details -
Workaround groupby aggregate thrust::copy_if overflow (#12079)
Workaround for limitation in `thrust::copy_if` which fails if the input-iterator spans more than int-max. The `thrust::copy_if` hardcodes the iterator distance type to be an int https://github.com/NVIDIA/thrust/blob/dbd144ed543b60c4ff9d456edd19869e82fe8873/thrust/system/cuda/detail/copy_if.h#L699-L708 Found existing thrust issue: https://github.com/NVIDIA/thrust/issues/1302 This calls the `copy_if` in chunks if the iterator can span greater than int-max. Closes #12058 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Alessandro Bellina (https://github.com/abellina) - Robert Maynard (https://github.com/robertmaynard) - Nghia Truong (https://github.com/ttnghia) URL: #12079
Configuration menu - View commit details
-
Copy full SHA for 4497ed6 - Browse repository at this point
Copy the full SHA 4497ed6View commit details -
First pass of
pd.read_orc
changes in tests (#12103)This PR changes calls going via `pyarrow` and then `to_pandas` to directly call `pd.read_orc`. How-ever since `pd.read_orc` was added in pandas 1.0, we will need to version the call to this constructor. This PR does that. Partially contributes to #11540 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Lawrence Mitchell (https://github.com/wence-) URL: #12103
Configuration menu - View commit details
-
Copy full SHA for 8ca2bd9 - Browse repository at this point
Copy the full SHA 8ca2bd9View commit details -
Remove "Multi-GPU with Dask-cuDF" notebook. (#12095)
This PR removes an outdated notebook for "Multi-GPU with Dask-cuDF" from the docs. Resolves #6583 with some of the changes from #6665. See also: rapidsai/rapids.ai#256 (comment) Authors: - Bradley Dice (https://github.com/bdice) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Richard (Rick) Zamora (https://github.com/rjzamora) URL: #12095
Configuration menu - View commit details
-
Copy full SHA for b3429fb - Browse repository at this point
Copy the full SHA b3429fbView commit details -
Fix conditional_full_join benchmark (#12121)
The `CONDITIONAL_FULL_JOIN_BENCHMARK_DEFINE` benchmark category was mapping to `cudf::conditional_inner_join` instead of `cudf::conditional_full_join` Authors: - Gregory Kimball (https://github.com/GregoryKimball) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Divye Gala (https://github.com/divyegala) URL: #12121
Gregory Kimball authoredNov 10, 2022 Configuration menu - View commit details
-
Copy full SHA for b30664b - Browse repository at this point
Copy the full SHA b30664bView commit details -
Fix regex working-memory-size refactor error (#12119)
Fixes error in `working_memory_size()` member function passing the parameters incorrectly. This was introduce in #11927 and found in the nightly compute-sanitizer check. https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/branches/job/cudf-gpu-build-branch-22.12/19/CUDA=11.5/testReport/junit/cudamemcheck/STRINGS_TEST/StringsContainsTests_ContainsTest/ Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Jason Lowe (https://github.com/jlowe) - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) URL: #12119
Configuration menu - View commit details
-
Copy full SHA for 7f2a471 - Browse repository at this point
Copy the full SHA 7f2a471View commit details -
Refactor Parquet reader (#12046)
This is a rather non-simple refactor of Parquet reader, no new features or changes in algorithms were made: * Rename some functions. * Moving a lot of declarations and definitions of functions/structs/classes around. * Extract out some functions/structs/classes and put them into new files. * Rewrite doxgen for some functions * Use aliases for member variables (to shorten their names), instead of passing them as function parameters * Etc. Note that this is merely moving the current implementation around, preparing for adding chunked Parquet reader which is a fairly large implementation. This is also a blocker for: * #11867 * #11961 Authors: - Nghia Truong (https://github.com/ttnghia) - https://github.com/nvdbaranec Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Mike Wilson (https://github.com/hyperbolic2346) - Vukasin Milovanovic (https://github.com/vuule) URL: #12046
Configuration menu - View commit details
-
Copy full SHA for 70c7b7a - Browse repository at this point
Copy the full SHA 70c7b7aView commit details
Commits on Nov 11, 2022
-
Add symlinks to notebooks. (#12128)
Adds symlinks to notebooks from the user guide as requested by @taureandyernv. Going forward, new notebooks added to the user guide directory should also be symlinked in `/notebooks`. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12128
Configuration menu - View commit details
-
Copy full SHA for f87d2b4 - Browse repository at this point
Copy the full SHA f87d2b4View commit details -
Add JNI for
substring
without 'end' parameter. (#12113)Authors: - Liangcai Li (https://github.com/firestarman) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Nghia Truong (https://github.com/ttnghia) URL: #12113
Configuration menu - View commit details
-
Copy full SHA for 3894427 - Browse repository at this point
Copy the full SHA 3894427View commit details -
Fix alignment of compressed blocks in ORC writer (#12077)
Closes #11812 Fixed alignment of compressed blocks in ORC writer - impacted ZLIB compression. Re-enabled nvCOMP DEFLATE compression in ORC - nvCOMP 2.5+ only. Refactored the nvCOMP feature status(enabled/disabled in cuDF) checks to include reason why features are not enabled (if not enabled). Refactored call sites to return the detailed error message if an operation fails because of nvCOMP integration config. Refactored nvCOMP adapter macros to allow mocking of the parameters that determine if an nvCOMP feature is enabled (env var, GPU compute capability, nvCOMP version). Added tests to verify the logic of the newly refactored feature status checks (allowed by the mocking above). Fix a Parquet test that was calling ORC reader/writer 😬 Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Jim Brennan (https://github.com/jbrennan333) - Mike Wilson (https://github.com/hyperbolic2346) - Bradley Dice (https://github.com/bdice) URL: #12077
Configuration menu - View commit details
-
Copy full SHA for d335aa3 - Browse repository at this point
Copy the full SHA d335aa3View commit details -
Adds an EventHandler to Java MemoryBuffer to be invoked on close (#12125
) This PR adds an EventHandler to `MemoryBuffer` with a single method `onClosed`. This is invoked during the `close` call, but after the `refCount` has been updated. I am also making `getRefCount` public in this PR. Spill code in the RAPIDS Accelerator for Spark could likely assert/require that refCount==1 when taking in a new buffer to be spillable. This last change is a nice to have. Authors: - Alessandro Bellina (https://github.com/abellina) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Jim Brennan (https://github.com/jbrennan333) - Jason Lowe (https://github.com/jlowe) URL: #12125
Configuration menu - View commit details
-
Copy full SHA for 8668752 - Browse repository at this point
Copy the full SHA 8668752View commit details
Commits on Nov 14, 2022
-
Fix singleton-range
__setitem__
edge case (#12075)When trying to set a length-one range with a length-one array, an off-by-one error in `copying.copy_range` meant that the value was discarded. Fix that, and tidy up the semantics of `copy_range` a little while we're here. Closes #12073. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12075
Configuration menu - View commit details
-
Copy full SHA for 825f049 - Browse repository at this point
Copy the full SHA 825f049View commit details -
Enable automatic column projection in groupby().agg (#12124)
This PR corresponds to the Dask-cudf version of dask/dask#9442, which was found to improve the performance of many groupby-based workflows. After this PR, ```python import dask_cudf path = "/criteo-dataset/day_0.parquet" ddf = dask_cudf.read_parquet(path, split_row_groups=10) # The following takes <2s with this PR, and fails with # an OOM error on main (using a 32GB GPU): ddf.groupby("C1").agg({"C2": "mean"}).compute() ``` Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12124
Configuration menu - View commit details
-
Copy full SHA for 5081fb1 - Browse repository at this point
Copy the full SHA 5081fb1View commit details -
Add support for
DataFrame.from_dict
\to_dict
andSeries.to_dict
(#……12048) Resolves: #11934 - [x] Adds `DataFrame.from_dict` and `DataFrame.to_dict` - [x] Adds `Series.to_dict` Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) URL: #12048
Configuration menu - View commit details
-
Copy full SHA for b20a6e6 - Browse repository at this point
Copy the full SHA b20a6e6View commit details
Commits on Nov 15, 2022
-
Create an
int8
column inread_csv
when all elements are missing (#……12110) CSV reader creates int8 columns when all elements are null. However, when all elements in a column are missing (e.g. `names` option includes more columns than the CSV file), CSV reader creates an `int64` column. Such columns take up a lot more device memory. This PR changes the behavior so that all columns with no valid elements are created as `int8`. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Yunsong Wang (https://github.com/PointKernel) - Mark Harris (https://github.com/harrism) URL: #12110
Configuration menu - View commit details
-
Copy full SHA for b2e5069 - Browse repository at this point
Copy the full SHA b2e5069View commit details -
Cleanup common parsing code in JSON, CSV reader (#12022)
This PR will cleanup nested json reader and csv reader's common parsing code. - Uses `std::optional` for indicating parsing failure in `parse_numeric` - Cleanup - Removed `decode_value` as it only gives only specialization for timestamp and duration types, rest of types are passthrough. - Unified `decode_digit` Depends on #11898 and #12021 Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - GALI PREM SAGAR (https://github.com/galipremsagar) - Vukasin Milovanovic (https://github.com/vuule) URL: #12022
Configuration menu - View commit details
-
Copy full SHA for fd488cd - Browse repository at this point
Copy the full SHA fd488cdView commit details -
Fix/disable jitify lto (#12122)
Remove the possiblity of jit lto from cudf Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Mark Harris (https://github.com/harrism) - Paul Taylor (https://github.com/trxcllnt) URL: #12122
Configuration menu - View commit details
-
Copy full SHA for bae9e39 - Browse repository at this point
Copy the full SHA bae9e39View commit details -
Add in negative size checks for columns (#12118)
This fixes #12116 This just adds in a few checks for negative sizes to avoid any issues with rounding errors and also helps us detect errors sooner. It will not fix small negative allocations for device buffers directly. Authors: - Robert (Bobby) Evans (https://github.com/revans2) Approvers: - Karthikeyan (https://github.com/karthikeyann) - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: #12118
Configuration menu - View commit details
-
Copy full SHA for 186e129 - Browse repository at this point
Copy the full SHA 186e129View commit details -
Safely allocate
udf_string
pointers instrings_udf
(#12138)In `strings_udf`, functions that return strings are built around c++ methods that return a `cudf::strings::udf::udf_string` object. However due to requiring external `C` linkage, our shim functions need to work by accepting a pointer to a preallocated `udf_string` object and setting the result into the memory it points to before returning. This piece of memory is allocated based off our `UDFString` extension class datamodel and while it is set up to be the right size, simply allocating it does not actually call the underlying `udf_string` default constructor so the memory isn't necessarily initialized in the same way a proper `udf_string` would initialize it. This can lead to some unsafe behavior when we try and assign the result. This PR changes it so that whenever we need to allocate a `udf_string` and pass its pointer to a shim function, we first zfill that memory. Authors: - https://github.com/brandon-b-miller Approvers: - David Wendt (https://github.com/davidwendt) - Graham Markall (https://github.com/gmarkall) - Lawrence Mitchell (https://github.com/wence-) URL: #12138
Configuration menu - View commit details
-
Copy full SHA for 4b7f5a7 - Browse repository at this point
Copy the full SHA 4b7f5a7View commit details -
In the upcoming CuPy 12 cp.clip slightly changed the function signature: https://github.com/cupy/cupy/blob/6d857add3d46368705e133121cf49153039952e9/cupy/_math/misc.py#L147 This PR is still valid for CuPy 11 but will also satisfy the upcoming release Authors: - Benjamin Zaitlen (https://github.com/quasiben) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Bradley Dice (https://github.com/bdice) - Lawrence Mitchell (https://github.com/wence-) URL: #12148
Configuration menu - View commit details
-
Copy full SHA for 98880d2 - Browse repository at this point
Copy the full SHA 98880d2View commit details -
Accelerate libcudf segmented sort with CUB segmented sort (#11969)
Moves the CUB segmented sort acceleration code logic from `cudf::lists::segmented_sorted_order` to `cudf::segmented_sorted_order` so these two functions are now aligned in behavior and performance. This change allows `cudf::lists::segmented_sorted_order` is to use the `cudf::detail::segmented_sorted_order` for all cases and simplifies the implementation there. This is also improves the performance of `cudf::segmented_sorted_order` since appropriate columns can now use CUB's optimized segmented sort. No function has been changed and the existing tests are sufficient -- source has only been refactored. Added a segmented-sort benchmark using nvbench as well. Reference #11729 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: #11969
Configuration menu - View commit details
-
Copy full SHA for 90f0a77 - Browse repository at this point
Copy the full SHA 90f0a77View commit details
Commits on Nov 16, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 414140b - Browse repository at this point
Copy the full SHA 414140bView commit details -
Fix decimal binary operations (#12142)
Fixes: #11636 This PR: - [x] Fixes an `UnboundLocalError` error. - [x] Fixes reflected binary operations and added tests for the same. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Bradley Dice (https://github.com/bdice) URL: #12142
Configuration menu - View commit details
-
Copy full SHA for c574ddf - Browse repository at this point
Copy the full SHA c574ddfView commit details -
Fix type promotion edge cases in numerical binops (#12074)
The type normalisation applied before heading into libcudf previously had slightly unexpected consequences for large int64 values. If not providing a `cudf.Scalar`, a bare `int64` scalar would be cast to `uint64` and then normal numpy type promotion would unify to `float64`. This is lossy, since int64 to float64 is neither surjective nor injective. To avoid this, try very hard to maintain the dtype of the object coming in, and match pandas behaviour by applying numpy type promotion rules via `numpy.result_type`. - Closes #5938 - Closes #7389 - Closes #12072 - Closes #12092 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12074
Configuration menu - View commit details
-
Copy full SHA for a8c0f4b - Browse repository at this point
Copy the full SHA a8c0f4bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 38235de - Browse repository at this point
Copy the full SHA 38235deView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7adf229 - Browse repository at this point
Copy the full SHA 7adf229View commit details -
Support
+
instrings_udf
(#12117)This PR adds support for the following operator `strings_udf`: - `st + other` Part of #9639 Authors: - https://github.com/brandon-b-miller - David Wendt (https://github.com/davidwendt) Approvers: - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) URL: #12117
Configuration menu - View commit details
-
Copy full SHA for 742093e - Browse repository at this point
Copy the full SHA 742093eView commit details -
Use rapidsai CODE_OF_CONDUCT.md (#12166)
This repo's `CODE_OF_CONDUCT.md` is superseded by an organization-wide policy: rapidsai/.github#3 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #12166
Configuration menu - View commit details
-
Copy full SHA for 6ad5752 - Browse repository at this point
Copy the full SHA 6ad5752View commit details -
byte_range support for JSON Lines format (#12017)
This PR adds support for byte_range to be used in nested JSON parser for JSON Lines format (newline delimited JSON http://ndjson.org/) The record delimiter "New lines" are only expected at the end of each record. Newlines in middle of record or within quotes are not expected and will lead to unknown behaviour. The record delimiters are not context aware in this PR. This PR provides libcudf APIs, Cython APIs and python tests to enable byte range support. This will allow dask to do distributed/segmented parsing of JSON. No Dask changes Addresses part of #11843 Depends on #12060 Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Elias Stehle (https://github.com/elstehle) - Lawrence Mitchell (https://github.com/wence-) - Robert Maynard (https://github.com/robertmaynard) URL: #12017
Configuration menu - View commit details
-
Copy full SHA for defad5e - Browse repository at this point
Copy the full SHA defad5eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8d84f2d - Browse repository at this point
Copy the full SHA 8d84f2dView commit details -
Support nested types as groupby keys in libcudf (#11792)
Authors: - Yunsong Wang (https://github.com/PointKernel) - Ashwin Srinath (https://github.com/shwina) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) URL: #11792
Configuration menu - View commit details
-
Copy full SHA for afb3c97 - Browse repository at this point
Copy the full SHA afb3c97View commit details -
Spilling to host memory (#12106)
This PR implementing spilling of device to host memory, which is based on #11553. Spilling can be enabled in two ways (it is disabled by default): - setting the environment variable `CUDF_SPILL=on`, or - setting the `spill` option in `cudf` by doing `cudf.set_option("spill", True)`. Additionally, parameters are: - `CUDF_SPILL_ON_DEMAND=ON` / `cudf.set_option("spill_on_demand", True)`, which registers an RMM out-of-memory error handler that spills buffers in order to free up memory. - `CUDF_SPILL_DEVICE_LIMIT=...` / `cudf.set_option("spill_device_limit", ...)`, which sets a device memory limit in bytes. I have limited the scope of this PR. In a follow-up PR, I will port the statistics, logging, and partial unspill from #11553. ### Design Spilling consists of two components: - A new buffer sub-class, `SpillableBuffer`, that implements moving of its data from host to device memory in-place. - A spill manager that tracks all instances of `SpillableBuffer` and spills them on demand. A global spill manager is used throughout cudf when spilling is enabled, which makes `as_buffer()` return `SpillableBuffer` instead of the default `Buffer` instances. #### Challenges Accessing `Buffer.ptr`, we get the device memory pointer of the buffer. This is unproblematic in the case of `Buffer` but what happens when accessing `SpillableBuffer.ptr`, which might have spilled its device memory? In this case, `SpillableBuffer` needs to unspill the memory before returning its device memory pointer. Furthermore, while this device memory pointer is being used (or could be used), `SpillableBuffer` cannot spill its memory back to host memory because doing so would invalidate the device pointer. To address this, we mark the `SpillableBuffer` as unspillable, we say that the buffer has been _exposed_. This can be either permanent if the device pointer is exposed to external projects or temporary while `libcudf` accesses the device memory. The `SpillableBuffer.get_ptr()` returns the device pointer of the buffer memory just like `.ptr` but if given an instance of `SpillLock`, the buffer is only unspillable as long as the instance of `SpillLock` is alive. For convenience, one can use the decorator/context `with_spill_lock` to associate a `SpillLock` with a lifetime bound to the context automatically. ### Overhead When spilling is disabled, the overhead of this PR comes from the decorator `with_spill_lock`. However, this is small https://gist.github.com/madsbk/da6520e7583cf5d728a1b5a1b09200f3: ``` Micro benchmark on my local workstation: spilling off: raw: 0.06371338899771217 us with-spill-lock: 1.0796624180002254 us spilling on: raw: 0.05873749500096892 us with-spill-lock: 1.2184517139976379 us ``` ## Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Vyas Ramasubramani (https://github.com/vyasr) - AJ Schmidt (https://github.com/ajschmidt8) URL: #12106
Configuration menu - View commit details
-
Copy full SHA for 95a348b - Browse repository at this point
Copy the full SHA 95a348bView commit details -
Refactor
purge_nonempty_nulls
(#12111)This refactor combines the discrete interfaces of `purge_nonempty_nulls` that require `structs/strings/lists_column_view` input into just one interface accepting just `column_view`. This facilitates easier usage of this function. It is also a necessary step for subsequent work in fixing `structs::superimpose_parent_nulls`. `cudf::detail` interface for this new API is also added. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) - AJ Schmidt (https://github.com/ajschmidt8) - Ray Douglass (https://github.com/raydouglass) URL: #12111
Configuration menu - View commit details
-
Copy full SHA for 73d73a7 - Browse repository at this point
Copy the full SHA 73d73a7View commit details -
Don't rely on GNU find in headers_test.sh (#12164)
`-printf` is a GNU find extension, so `headers_test.sh` fails on systems where binutils is a BSD toolchain. To get around this, use sed to obtain the effect of `-printf`. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) - Ray Douglass (https://github.com/raydouglass) URL: #12164
Configuration menu - View commit details
-
Copy full SHA for ae101cc - Browse repository at this point
Copy the full SHA ae101ccView commit details
Commits on Nov 17, 2022
-
Merge branch 'branch-22.12' of https://github.com/rapidsai/cudf into …
…bug-read_orc-empty-map-column
Configuration menu - View commit details
-
Copy full SHA for ce97a54 - Browse repository at this point
Copy the full SHA ce97a54View commit details -
Fix issues when both
usecols
andnames
options are used in `read_……csv` (#12018) closes #8973 CSV reader has a few gaps in the logic for column selection and user specified column names: 1. Users cannot only specify the names of selected columns; 2. Reader fails in unpredictable ways when only a subset of column names is passed (w/o column selection); This PR fixes the issues above. Users can now specify column names (can be lower than the actual number of columns) or names of columns selected via their indices (must match the number of indices). If selection via indices is used, the number of column names has to match either the actual number of columns, or the number of selected columns. Also fixed test an error that went unnoticed due to issues above. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Karthikeyan (https://github.com/karthikeyann) - Vyas Ramasubramani (https://github.com/vyasr) - Nghia Truong (https://github.com/ttnghia) - https://github.com/nvdbaranec URL: #12018
Configuration menu - View commit details
-
Copy full SHA for 6de2c4e - Browse repository at this point
Copy the full SHA 6de2c4eView commit details -
Support
upper
andlower
instrings_udf
(#12099)This PR adds support for the following two functions in `strings_udf`: - `str.upper()` - `str.lower()` Part of #9639 Authors: - https://github.com/brandon-b-miller - David Wendt (https://github.com/davidwendt) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Lawrence Mitchell (https://github.com/wence-) - David Wendt (https://github.com/davidwendt) URL: #12099
Configuration menu - View commit details
-
Copy full SHA for aa13b95 - Browse repository at this point
Copy the full SHA aa13b95View commit details -
Allow setting malloc heap size in string udfs (#12094)
Adds a mechanism for setting the default cuda malloc heap size for string UDFs, with 2gb default. Authors: - https://github.com/brandon-b-miller Approvers: - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) URL: #12094
Configuration menu - View commit details
-
Copy full SHA for 2f2685f - Browse repository at this point
Copy the full SHA 2f2685fView commit details -
Ensure dlpack include is provided to cudf interop lib (#12139)
As brought up in #12081 it is possible to have python build failures due to no include paths to dlpack being provided. This fixes the issue by ensure that the DLPACK_INCLUDE_DIR is propagated down to the interop target. We don't run into this issue with conda, since the dlpack headers are inside the conda include dir which is already being provided to the compiler. Authors: - Robert Maynard (https://github.com/robertmaynard) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #12139
Configuration menu - View commit details
-
Copy full SHA for db0d045 - Browse repository at this point
Copy the full SHA db0d045View commit details
Commits on Nov 18, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ec8888c - Browse repository at this point
Copy the full SHA ec8888cView commit details -
Configuration menu - View commit details
-
Copy full SHA for e29ea84 - Browse repository at this point
Copy the full SHA e29ea84View commit details -
Implement chunked Parquet reader (#11867)
This adds chunked Parquet reader, which can perform chunked reading for accessing files by an iterative manner. Instead of reading the input file all at once, we can read it chunk by chunk, each chunk can be limited to be small enough to not exceed the cudf internal limit (2GB/2 billions rows): ``` auto reader = cudf::io::chunked_parquet_reader(byte_limit, read_opts); do { auto const chunk = reader.read_chunk(); // Process chunk } while (reader.has_next()); ``` Authors: - Nghia Truong (https://github.com/ttnghia) - https://github.com/nvdbaranec Approvers: - Yunsong Wang (https://github.com/PointKernel) - Vukasin Milovanovic (https://github.com/vuule) URL: #11867
Configuration menu - View commit details
-
Copy full SHA for 3fb09d1 - Browse repository at this point
Copy the full SHA 3fb09d1View commit details -
This PR enables building wheels. It mostly leverages various build options that have already been added to the repository. Authors: - Vyas Ramasubramani (https://github.com/vyasr) - Sevag H (https://github.com/sevagh) - Paul Taylor (https://github.com/trxcllnt) Approvers: - Bradley Dice (https://github.com/bdice) - GALI PREM SAGAR (https://github.com/galipremsagar) - Sevag H (https://github.com/sevagh) URL: #12096
Configuration menu - View commit details
-
Copy full SHA for 6d2a4f0 - Browse repository at this point
Copy the full SHA 6d2a4f0View commit details -
Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug (#…
…12188) Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - AJ Schmidt (https://github.com/ajschmidt8) URL: #12188
Configuration menu - View commit details
-
Copy full SHA for cc4b4dd - Browse repository at this point
Copy the full SHA cc4b4ddView commit details -
Merge branch 'branch-22.12' of https://github.com/rapidsai/cudf into …
…bug-read_orc-empty-map-column
Configuration menu - View commit details
-
Copy full SHA for 30bc05c - Browse repository at this point
Copy the full SHA 30bc05cView commit details -
Configuration menu - View commit details
-
Copy full SHA for cbd07a5 - Browse repository at this point
Copy the full SHA cbd07a5View commit details -
Merge pull request #12198 from davidwendt/branch-22.12-merge-22.10
Merge branch-22.10 into branch-22.12
Configuration menu - View commit details
-
Copy full SHA for 3c94071 - Browse repository at this point
Copy the full SHA 3c94071View commit details -
Reduce number of tests marked
spilling
(#12197)To save CI running time, this PR reduce the tests marked `spilling` drastically. An alternative to #12187 Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - https://github.com/brandon-b-miller - GALI PREM SAGAR (https://github.com/galipremsagar)
Configuration menu - View commit details
-
Copy full SHA for a2f69e4 - Browse repository at this point
Copy the full SHA a2f69e4View commit details -
Implement JNI for chunked Parquet reader (#11961)
This adds JNI for chunked Parquet reader. It depends on the chunked Parquet reader implementation PR (#11867). Authors: - https://github.com/nvdbaranec - Nghia Truong (https://github.com/ttnghia) Approvers: - MithunR (https://github.com/mythrocks) - Robert (Bobby) Evans (https://github.com/revans2)
Configuration menu - View commit details
-
Copy full SHA for 782fba3 - Browse repository at this point
Copy the full SHA 782fba3View commit details -
Merge branch 'branch-22.12' of https://github.com/rapidsai/cudf into …
…bug-write_orc-compressission
Configuration menu - View commit details
-
Copy full SHA for c79c2d1 - Browse repository at this point
Copy the full SHA c79c2d1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 08c0c5a - Browse repository at this point
Copy the full SHA 08c0c5aView commit details -
Merge branch 'branch-22.12' of https://github.com/rapidsai/cudf into …
…bug-read_orc-empty-map-column
Configuration menu - View commit details
-
Copy full SHA for 9292b50 - Browse repository at this point
Copy the full SHA 9292b50View commit details -
Fix dask backend dispatch (#12203)
This PR fixes a failure being observed in `dask` upstream: dask/dask#9676 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Richard (Rick) Zamora (https://github.com/rjzamora)
Configuration menu - View commit details
-
Copy full SHA for 21ba312 - Browse repository at this point
Copy the full SHA 21ba312View commit details
Commits on Nov 19, 2022
-
Configuration menu - View commit details
-
Copy full SHA for a8afc75 - Browse repository at this point
Copy the full SHA a8afc75View commit details
Commits on Nov 21, 2022
-
Merge pull request #12194 from vuule/bug-write_orc-compressission
Fix compression in ORC writer
Configuration menu - View commit details
-
Copy full SHA for 769dfbb - Browse repository at this point
Copy the full SHA 769dfbbView commit details -
Configuration menu - View commit details
-
Copy full SHA for e670c10 - Browse repository at this point
Copy the full SHA e670c10View commit details -
Configuration menu - View commit details
-
Copy full SHA for cd6dff3 - Browse repository at this point
Copy the full SHA cd6dff3View commit details
Commits on Nov 22, 2022
-
Merge branch 'branch-22.12' of https://github.com/rapidsai/cudf into …
…bug-read_orc-empty-map-column
Configuration menu - View commit details
-
Copy full SHA for 6756b02 - Browse repository at this point
Copy the full SHA 6756b02View commit details -
Configuration menu - View commit details
-
Copy full SHA for f15080f - Browse repository at this point
Copy the full SHA f15080fView commit details -
Merge pull request #12217 from davidwendt/bug-cub-segmented-sort
Workaround for CUB segmented-sort bug with boolean keys
Configuration menu - View commit details
-
Copy full SHA for 49f983d - Browse repository at this point
Copy the full SHA 49f983dView commit details -
Merge pull request #12160 from vuule/bug-read_orc-empty-map-column
Fix data corruption when reading ORC files with empty stripes
Configuration menu - View commit details
-
Copy full SHA for ed35f67 - Browse repository at this point
Copy the full SHA ed35f67View commit details
Commits on Nov 23, 2022
-
Make dask pinning looser (#12231)
* Make pinning >=. * Temporarily reenable wheel builds to ensure that things work as expected. * Skip cudf tests and make sure dask-cudf builds. * Undo changes to wheels scripts.
Configuration menu - View commit details
-
Copy full SHA for 0c60819 - Browse repository at this point
Copy the full SHA 0c60819View commit details
Commits on Nov 28, 2022
-
Configuration menu - View commit details
-
Copy full SHA for c83ff55 - Browse repository at this point
Copy the full SHA c83ff55View commit details
Commits on Nov 29, 2022
-
Merge pull request #12250 from vyasr/fix/io_numpy_link
Fix include line for IO Cython modules
Configuration menu - View commit details
-
Copy full SHA for eb27104 - Browse repository at this point
Copy the full SHA eb27104View commit details
Commits on Dec 1, 2022
-
Configuration menu - View commit details
-
Copy full SHA for fc2ec42 - Browse repository at this point
Copy the full SHA fc2ec42View commit details -
Configuration menu - View commit details
-
Copy full SHA for 297911f - Browse repository at this point
Copy the full SHA 297911fView commit details -
Configuration menu - View commit details
-
Copy full SHA for cbdefb8 - Browse repository at this point
Copy the full SHA cbdefb8View commit details
Commits on Dec 2, 2022
-
Merge pull request #12165 from galipremsagar/pin_dask
[REVIEW] Pin `dask` and `distributed` for release
Configuration menu - View commit details
-
Copy full SHA for 9cd9841 - Browse repository at this point
Copy the full SHA 9cd9841View commit details
Commits on Dec 8, 2022
-
Configuration menu - View commit details
-
Copy full SHA for f471bcc - Browse repository at this point
Copy the full SHA f471bccView commit details