Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JNI for strings::split_re and strings::split_record_re #10139

Merged
merged 54 commits into from
Feb 14, 2022

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Jan 26, 2022

This PR adds Java binding for the new strings API strings::split_re and strings::split_record_re, which allows splitting strings by regular expression delimiters.

In addition, the Java string split overloads with default split pattern (an empty string) are removed in this PR. That is because with default empty pattern the Java's split API produces different results than cudf.

Finally, some cleanup has been perform automatically thanks to IntelliJ IDE.

Depends on #10128.

This is breaking change which is fixed by NVIDIA/spark-rapids#4714. Thus, it should be merged at the same time with NVIDIA/spark-rapids#4714.

@ttnghia ttnghia added feature request New feature or request 2 - In Progress Currently a work in progress Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS strings strings issues (C++ and Python) non-breaking Non-breaking change labels Jan 26, 2022
@ttnghia ttnghia self-assigned this Jan 26, 2022
@kuhushukla
Copy link
Contributor

kuhushukla commented Jan 26, 2022

Don't you need java side changes and java tests as well here? Typically Java bindings and JNI bindings go hand in hand. This is in draft mode so I guess you might have it brewing on your branch still but wanted to put this out just in case.

@ttnghia
Copy link
Contributor Author

ttnghia commented Jan 26, 2022

Don't you need java side changes and java tests as well here? Typically Java bindings and JNI bindings go hand in hand.

Sure, I'll add. This is still draft WIP :)

@codecov

This comment has been minimized.

# Conflicts:
#	cpp/include/cudf/strings/split/split_re.hpp
@github-actions github-actions bot removed conda libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Feb 11, 2022
@ttnghia ttnghia added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Feb 11, 2022
@ttnghia ttnghia marked this pull request as ready for review February 11, 2022 19:46
@ttnghia ttnghia requested a review from a team as a code owner February 11, 2022 19:46
@ttnghia ttnghia requested a review from revans2 February 11, 2022 19:47
Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few nits that are not required

java/src/main/native/src/ColumnViewJni.cpp Show resolved Hide resolved
java/src/main/native/src/ColumnViewJni.cpp Outdated Show resolved Hide resolved
@ttnghia ttnghia removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Feb 14, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented Feb 14, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 7025c40 into rapidsai:branch-22.04 Feb 14, 2022
@ttnghia ttnghia deleted the jni_for_strings_split_re branch February 14, 2022 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change feature request New feature or request Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants