Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branch-23.12 into branch-24.02 #14414

Conversation

galipremsagar
Copy link
Contributor

Description

This PR resolves conflicts in #14406

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

vuule and others added 15 commits November 10, 2023 00:00
Update the nvCOMP version used for cuIO compression/decompression to 3.0.4.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai#13815
All of these wrappers have now been upstreamed into Cython as of Cython 3.0.3.

Contributes to rapidsai#14023

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Bradley Dice (https://github.com/bdice)
  - Jake Awe (https://github.com/AyodeAwe)

URL: rapidsai#14382
Creates a normalizing offsets iterator that returns an int64 value given either a int32 or int64 column data.
Depends on rapidsai#14206

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Yunsong Wang (https://github.com/PointKernel)

URL: rapidsai#14234
…rapidsai#14364)

* Update dependency lists

* Update wheel building to stop needing manual installations

* Update wheel dependency with alpha spec

* Rename the package

* Update update-version.sh

* Update conda/recipes/dask-cudf/meta.yaml

Co-authored-by: GALI PREM SAGAR <sagarprem75@gmail.com>

* Make pip/conda dependencies consistent and fix recipe

* dfg

* Apply suggestions from code review

---------

Co-authored-by: GALI PREM SAGAR <sagarprem75@gmail.com>
…sai#14399)

Corrects failures seen in C++ CI where libnvbench.so can't be found

Authors:
  - Robert Maynard (https://github.com/robertmaynard)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#14399
…14388)

Closes rapidsai#14384. `x.startswith(y)` is not a good enough check for if `x` is a subdirectory of `y`. It causes `pandasai` to be reported as a sub-package of `pandas`.

Authors:
  - Ashwin Srinath (https://github.com/shwina)

Approvers:
  - https://github.com/brandon-b-miller

URL: rapidsai#14388
Refactor the currently outdated cudf_kafka build setup to use skbuild instead.

Authors:
  - Jeremy Dyer (https://github.com/jdye64)
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Bradley Dice (https://github.com/bdice)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#14292
Adds a new BytePairEncoding class to cuDF
```
>>> import cudf
>>> from cudf.core.byte_pair_encoding import BytePairEncoder
>>> mps = cudf.read_text('merges.txt', delimiter='\n', strip_delimiters=True)
>>> bpe = BytePairEncoder(mps)
>>> str_series = cudf.Series(['This is a sentence', 'thisisit'])
>>> bpe(str_series)
0    This is a sent ence
1             this is it
dtype: object
```
This class wraps the existing `nvtext::byte_pair_encoding` APIs to load the merge-pairs data and encode a column of strings.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#13891
…4393)

Fixes a bug introduced in rapidsai#14336 when trying to simplify the token-counting logic as per this discussion rapidsai#14336 (comment)
The simplification caused an error which was found when running the nvtext benchmarks.
The appropriate gtest has been updated to cover this case now.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Karthikeyan (https://github.com/karthikeyann)

URL: rapidsai#14393
This PR switches remaining usages of `dask` dependencies to use `rapids-dask-dependency`

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Jake Awe (https://github.com/AyodeAwe)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#14407
This PR contributes to rapidsai#13744.
-Added stream parameters to public APIs
`cudf::io::read_csv`
`cudf::io::write_csv`
-Added stream gtests

Authors:
  - https://github.com/shrshi
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Karthikeyan (https://github.com/karthikeyann)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Yunsong Wang (https://github.com/PointKernel)

URL: rapidsai#14340
…ai#14411)

Port NVIDIA/nvbench#148 to cudf so that nvbench benchmarks work now that we always use a static version of nvbench.

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#14411
…apidsai#14390)

Noticed this while trying to clean up `as_column`

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#14390
…idsai#14367)

Issue rapidsai#14325

Use uint when reading/writing nano stats because nanoseconds have int32 encoding (different from both unit32 and sint32, _obviously_), which does not use zigzag. 
sint32 uses zigzag, and unit32 does not allow negative numbers, so we can use uint since we'll never have negative nanoseconds.

Also disabled the nanoseconds because it should only be written after ORC-135; we don't write the version so readers get confused if nanoseconds are there. Planning to re-enable once we start writing the version.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Nghia Truong (https://github.com/ttnghia)

URL: rapidsai#14367
Fixes: rapidsai#14398 
This PR raises an error in `reindex` API when reindexing is performed on a non-unique index column.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)
  - Lawrence Mitchell (https://github.com/wence-)

URL: rapidsai#14400
@galipremsagar galipremsagar requested review from a team as code owners November 15, 2023 14:04
@galipremsagar galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 15, 2023
@galipremsagar galipremsagar self-assigned this Nov 15, 2023
@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. CMake CMake build issue labels Nov 15, 2023
@raydouglass raydouglass added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 15, 2023
@raydouglass
Copy link
Member

Marking with DO NOT MERGE since this must be manually merged with a merge commit.

@vyasr
Copy link
Contributor

vyasr commented Nov 15, 2023

We need kvikio's PR to merge first so that we can get kvikio builds up for CI to run here.

@vyasr
Copy link
Contributor

vyasr commented Nov 16, 2023

There's an error in this forward merger in the cudf-kafka CMakeLists.txt. I'm not sure if that's the only issue, so I'm going to try and do another version of the forward merge myself then diff against this PR to verify.

@vyasr
Copy link
Contributor

vyasr commented Nov 16, 2023

@bdice wants to make the PR, so I'm just going to verify that I see the differences I expect when he's done.

@galipremsagar
Copy link
Contributor Author

closing for #14422

@vyasr
Copy link
Contributor

vyasr commented Nov 16, 2023

For the record, I diffed the two PR branches and @bdice's has the change I was expecting to see:

(rapids) rapids@compose:~/cudf/second_workspace$ git diff bdice/branch-24.02-merge-23.12 galipremsagar/branch-24.02-merge-23.12
diff --git a/conda/recipes/custreamz/meta.yaml b/conda/recipes/custreamz/meta.yaml
index 755394e393..b8c5918ea6 100644
--- a/conda/recipes/custreamz/meta.yaml
+++ b/conda/recipes/custreamz/meta.yaml
@@ -45,7 +45,7 @@ requirements:
     - streamz
     - cudf ={{ version }}
     - cudf_kafka ={{ version }}
-    - rapids-dask-dependency ={{ minor_version }}
+    - rapids-dask-dependency ={{ version }}
     - python-confluent-kafka >=1.9.0,<1.10.0a0
     - {{ pin_compatible('cuda-version', max_pin='x', min_pin='x') }}

diff --git a/cpp/libcudf_kafka/CMakeLists.txt b/cpp/libcudf_kafka/CMakeLists.txt
index e31f6bd409..939239b8b2 100644
--- a/cpp/libcudf_kafka/CMakeLists.txt
+++ b/cpp/libcudf_kafka/CMakeLists.txt
@@ -21,7 +21,7 @@ include(rapids-export)
 include(rapids-find)

 project(
-  CUDF_KAFKA
+  CUDA_KAFKA
   VERSION 24.02.00
   LANGUAGES CXX
 )
diff --git a/python/cudf_kafka/CMakeLists.txt b/python/cudf_kafka/CMakeLists.txt
index d55c3fdc07..1e21c87358 100644
--- a/python/cudf_kafka/CMakeLists.txt
+++ b/python/cudf_kafka/CMakeLists.txt
@@ -14,7 +14,7 @@

 cmake_minimum_required(VERSION 3.26.4 FATAL_ERROR)

-set(cudf_kafka_version 23.12.00)
+set(cudf_kafka_version 24.02.00)

 include(../../fetch_rapids.cmake)

namely the CUDA_KAFKA vs CUDF_KAFKA and the cudf_kafka_version (the first one is from a separate new PR that isn't in this forward merge but is otherwise unrelated).

@vyasr
Copy link
Contributor

vyasr commented Nov 16, 2023

Whoops just realized that the cudf_kafka_version fix is actually backwards. Each PR has one of the right two. Bradley's fixing that version on his PR now.

@bdice
Copy link
Contributor

bdice commented Nov 16, 2023

Done: e4e6975

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - DO NOT MERGE Hold off on merging; see PR for details CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.