Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward-merge branch-23.12 to branch-24.02 #14422

Merged
merged 18 commits into from
Nov 16, 2023

Commits on Nov 10, 2023

  1. Upgrade to nvCOMP 3.0.4 (rapidsai#13815)

    Update the nvCOMP version used for cuIO compression/decompression to 3.0.4.
    
    Authors:
      - Vukasin Milovanovic (https://github.com/vuule)
      - Bradley Dice (https://github.com/bdice)
    
    Approvers:
      - Bradley Dice (https://github.com/bdice)
      - Ray Douglass (https://github.com/raydouglass)
    
    URL: rapidsai#13815
    vuule authored Nov 10, 2023
    Configuration menu
    Copy the full SHA
    9be4de5 View commit details
    Browse the repository at this point in the history
  2. Remove Cython libcpp wrappers (rapidsai#14382)

    All of these wrappers have now been upstreamed into Cython as of Cython 3.0.3.
    
    Contributes to rapidsai#14023
    
    Authors:
      - Vyas Ramasubramani (https://github.com/vyasr)
    
    Approvers:
      - GALI PREM SAGAR (https://github.com/galipremsagar)
      - Bradley Dice (https://github.com/bdice)
      - Jake Awe (https://github.com/AyodeAwe)
    
    URL: rapidsai#14382
    vyasr authored Nov 10, 2023
    Configuration menu
    Copy the full SHA
    87d2a36 View commit details
    Browse the repository at this point in the history

Commits on Nov 13, 2023

  1. Normalizing offsets iterator (rapidsai#14234)

    Creates a normalizing offsets iterator that returns an int64 value given either a int32 or int64 column data.
    Depends on rapidsai#14206
    
    Authors:
      - David Wendt (https://github.com/davidwendt)
    
    Approvers:
      - Divye Gala (https://github.com/divyegala)
      - Yunsong Wang (https://github.com/PointKernel)
    
    URL: rapidsai#14234
    davidwendt authored Nov 13, 2023
    Configuration menu
    Copy the full SHA
    04d13d8 View commit details
    Browse the repository at this point in the history
  2. Use new rapids-dask-dependency metapackage for managing dask versions (

    …rapidsai#14364)
    
    * Update dependency lists
    
    * Update wheel building to stop needing manual installations
    
    * Update wheel dependency with alpha spec
    
    * Rename the package
    
    * Update update-version.sh
    
    * Update conda/recipes/dask-cudf/meta.yaml
    
    Co-authored-by: GALI PREM SAGAR <sagarprem75@gmail.com>
    
    * Make pip/conda dependencies consistent and fix recipe
    
    * dfg
    
    * Apply suggestions from code review
    
    ---------
    
    Co-authored-by: GALI PREM SAGAR <sagarprem75@gmail.com>
    vyasr and galipremsagar authored Nov 13, 2023
    Configuration menu
    Copy the full SHA
    4313cfa View commit details
    Browse the repository at this point in the history

Commits on Nov 14, 2023

  1. Always build nvbench statically so we don't need to package it (rapid…

    …sai#14399)
    
    Corrects failures seen in C++ CI where libnvbench.so can't be found
    
    Authors:
      - Robert Maynard (https://github.com/robertmaynard)
      - Vyas Ramasubramani (https://github.com/vyasr)
    
    Approvers:
      - Vyas Ramasubramani (https://github.com/vyasr)
      - Bradley Dice (https://github.com/bdice)
    
    URL: rapidsai#14399
    robertmaynard authored Nov 14, 2023
    Configuration menu
    Copy the full SHA
    5d09d38 View commit details
    Browse the repository at this point in the history
  2. cudf.pandas: cuDF subpath checking in module __getattr__ (rapidsai#…

    …14388)
    
    Closes rapidsai#14384. `x.startswith(y)` is not a good enough check for if `x` is a subdirectory of `y`. It causes `pandasai` to be reported as a sub-package of `pandas`.
    
    Authors:
      - Ashwin Srinath (https://github.com/shwina)
    
    Approvers:
      - https://github.com/brandon-b-miller
    
    URL: rapidsai#14388
    shwina authored Nov 14, 2023
    Configuration menu
    Copy the full SHA
    e982d37 View commit details
    Browse the repository at this point in the history
  3. Refactor cudf_kafka to use skbuild (rapidsai#14292)

    Refactor the currently outdated cudf_kafka build setup to use skbuild instead.
    
    Authors:
      - Jeremy Dyer (https://github.com/jdye64)
      - Bradley Dice (https://github.com/bdice)
      - Vyas Ramasubramani (https://github.com/vyasr)
    
    Approvers:
      - Vyas Ramasubramani (https://github.com/vyasr)
      - Bradley Dice (https://github.com/bdice)
      - AJ Schmidt (https://github.com/ajschmidt8)
    
    URL: rapidsai#14292
    jdye64 authored Nov 14, 2023
    Configuration menu
    Copy the full SHA
    7f3fba1 View commit details
    Browse the repository at this point in the history
  4. Add BytePairEncoder class to cuDF (rapidsai#13891)

    Adds a new BytePairEncoding class to cuDF
    ```
    >>> import cudf
    >>> from cudf.core.byte_pair_encoding import BytePairEncoder
    >>> mps = cudf.read_text('merges.txt', delimiter='\n', strip_delimiters=True)
    >>> bpe = BytePairEncoder(mps)
    >>> str_series = cudf.Series(['This is a sentence', 'thisisit'])
    >>> bpe(str_series)
    0    This is a sent ence
    1             this is it
    dtype: object
    ```
    This class wraps the existing `nvtext::byte_pair_encoding` APIs to load the merge-pairs data and encode a column of strings.
    
    Authors:
      - David Wendt (https://github.com/davidwendt)
    
    Approvers:
      - Bradley Dice (https://github.com/bdice)
    
    URL: rapidsai#13891
    davidwendt authored Nov 14, 2023
    Configuration menu
    Copy the full SHA
    b0c1b7b View commit details
    Browse the repository at this point in the history
  5. Fix token-count logic in nvtext::tokenize_with_vocabulary (rapidsai#1…

    …4393)
    
    Fixes a bug introduced in rapidsai#14336 when trying to simplify the token-counting logic as per this discussion rapidsai#14336 (comment)
    The simplification caused an error which was found when running the nvtext benchmarks.
    The appropriate gtest has been updated to cover this case now.
    
    Authors:
      - David Wendt (https://github.com/davidwendt)
    
    Approvers:
      - Bradley Dice (https://github.com/bdice)
      - Karthikeyan (https://github.com/karthikeyann)
    
    URL: rapidsai#14393
    davidwendt authored Nov 14, 2023
    Configuration menu
    Copy the full SHA
    b446a6f View commit details
    Browse the repository at this point in the history
  6. Cleanup remaining usages of dask dependencies (rapidsai#14407)

    This PR switches remaining usages of `dask` dependencies to use `rapids-dask-dependency`
    
    Authors:
      - GALI PREM SAGAR (https://github.com/galipremsagar)
    
    Approvers:
      - Bradley Dice (https://github.com/bdice)
      - Jake Awe (https://github.com/AyodeAwe)
      - Vyas Ramasubramani (https://github.com/vyasr)
    
    URL: rapidsai#14407
    galipremsagar authored Nov 14, 2023
    Configuration menu
    Copy the full SHA
    8106a0c View commit details
    Browse the repository at this point in the history
  7. Added streams to CSV reader and writer api (rapidsai#14340)

    This PR contributes to rapidsai#13744.
    -Added stream parameters to public APIs
    `cudf::io::read_csv`
    `cudf::io::write_csv`
    -Added stream gtests
    
    Authors:
      - https://github.com/shrshi
      - Karthikeyan (https://github.com/karthikeyann)
    
    Approvers:
      - Karthikeyan (https://github.com/karthikeyann)
      - Vukasin Milovanovic (https://github.com/vuule)
      - Yunsong Wang (https://github.com/PointKernel)
    
    URL: rapidsai#14340
    shrshi authored Nov 14, 2023
    Configuration menu
    Copy the full SHA
    27b052d View commit details
    Browse the repository at this point in the history
  8. Ensure nvbench initializes nvml context when built statically (rapids…

    …ai#14411)
    
    Port NVIDIA/nvbench#148 to cudf so that nvbench benchmarks work now that we always use a static version of nvbench.
    
    Authors:
      - Robert Maynard (https://github.com/robertmaynard)
    
    Approvers:
      - Bradley Dice (https://github.com/bdice)
    
    URL: rapidsai#14411
    robertmaynard authored Nov 14, 2023
    Configuration menu
    Copy the full SHA
    330d389 View commit details
    Browse the repository at this point in the history

Commits on Nov 15, 2023

  1. Fix as_column(pd.Timestamp/Timedelta, length=) not respecting length (r…

    …apidsai#14390)
    
    Noticed this while trying to clean up `as_column`
    
    Authors:
      - Matthew Roeschke (https://github.com/mroeschke)
    
    Approvers:
      - Bradley Dice (https://github.com/bdice)
    
    URL: rapidsai#14390
    mroeschke authored Nov 15, 2023
    Configuration menu
    Copy the full SHA
    8a0a08f View commit details
    Browse the repository at this point in the history
  2. Fix and disable encoding for nanosecond statistics in ORC writer (rap…

    …idsai#14367)
    
    Issue rapidsai#14325
    
    Use uint when reading/writing nano stats because nanoseconds have int32 encoding (different from both unit32 and sint32, _obviously_), which does not use zigzag. 
    sint32 uses zigzag, and unit32 does not allow negative numbers, so we can use uint since we'll never have negative nanoseconds.
    
    Also disabled the nanoseconds because it should only be written after ORC-135; we don't write the version so readers get confused if nanoseconds are there. Planning to re-enable once we start writing the version.
    
    Authors:
      - Vukasin Milovanovic (https://github.com/vuule)
    
    Approvers:
      - Vyas Ramasubramani (https://github.com/vyasr)
      - Nghia Truong (https://github.com/ttnghia)
    
    URL: rapidsai#14367
    vuule authored Nov 15, 2023
    Configuration menu
    Copy the full SHA
    ab2248e View commit details
    Browse the repository at this point in the history
  3. Raise error in reindex when index is not unique (rapidsai#14400)

    Fixes: rapidsai#14398 
    This PR raises an error in `reindex` API when reindexing is performed on a non-unique index column.
    
    Authors:
      - GALI PREM SAGAR (https://github.com/galipremsagar)
    
    Approvers:
      - Matthew Roeschke (https://github.com/mroeschke)
      - Lawrence Mitchell (https://github.com/wence-)
    
    URL: rapidsai#14400
    galipremsagar authored Nov 15, 2023
    Configuration menu
    Copy the full SHA
    8deb3dd View commit details
    Browse the repository at this point in the history
  4. Fix dask dependency in custreamz (rapidsai#14420)

    rapidsai#14407 added a dask dependency to custreamz, but it added too tight of a pinning by requiring the exact same version. This is not valid because rapids-dask-dependency won't release a new version corresponding to each new cudf release, so pinning to the exact same version up to the alpha creates an unsatisfiable constraint.
    
    Authors:
       - Vyas Ramasubramani (https://github.com/vyasr)
    
    Approvers:
       - Ray Douglass (https://github.com/raydouglass)
       - Bradley Dice (https://github.com/bdice)
       - GALI PREM SAGAR (https://github.com/galipremsagar)
    vyasr authored Nov 15, 2023
    Configuration menu
    Copy the full SHA
    9e7f8a5 View commit details
    Browse the repository at this point in the history

Commits on Nov 16, 2023

  1. Configuration menu
    Copy the full SHA
    d56a70f View commit details
    Browse the repository at this point in the history
  2. Update cudf_kafka_version.

    bdice committed Nov 16, 2023
    Configuration menu
    Copy the full SHA
    e4e6975 View commit details
    Browse the repository at this point in the history