Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the udf-examples module to the external repository spark-rapids-examples [databricks] #4619

Merged
merged 12 commits into from
Feb 23, 2022

Conversation

res-life
Copy link
Collaborator

@res-life res-life commented Jan 24, 2022

[BUG] udf-examples dependencies are incorrect #3971

Move the the whole udf-examples module to the external spark-rapids-examples repository.
The moved udf-examples depends on Spark Rapids repository and Spark Rapids repository should not depends on the external udf-examples. This will avoid circular dependency between the two repositories.
Done: Update the jenkens pipeline to depend on the external udf-examples for test purpose.

Background:

  • mvn compile or mvn test for a clean build will fail, because of dist.jar is generated in the package phase and compile phase can't access the classes in the dist.jar. See issue #3966
  • Dependency dist.jar is not the same with the published one. The published one always removes the rapids-4-spark-aggregator dependency, see issue #4253

Contributes to #3971

@res-life
Copy link
Collaborator Author

build

@res-life res-life linked an issue Jan 24, 2022 that may be closed by this pull request
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of removing some, but not all, of the udf examples. They should be bundled together, along with their tests, and built separately from the plugin build to ensure it is compiling and testing against the plugin dist artifact.

udf-examples/pom.xml Outdated Show resolved Hide resolved
udf-examples/pom.xml Outdated Show resolved Hide resolved
@sameerz sameerz added the task Work required that improves the product but is not user facing label Jan 25, 2022
…examples repository

Signed-off-by: Chong Gao <res_life@163.com>
@res-life res-life changed the title Move the main part of udf-examples to the external repository spark-rapids-examples Move the udf-examples module to the external repository spark-rapids-examples Jan 26, 2022
@res-life res-life changed the title Move the udf-examples module to the external repository spark-rapids-examples Move the udf-examples module to the external repository spark-rapids-examples [databrickes] Jan 26, 2022
@res-life
Copy link
Collaborator Author

build

@res-life res-life changed the title Move the udf-examples module to the external repository spark-rapids-examples [databrickes] Move the udf-examples module to the external repository spark-rapids-examples [databricks] Jan 26, 2022
@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

Move the whole udf-examples module to external.
The moved udf-examples depends on Spark Rapids repository and Spark Rapids repository should not depends on the external udf-examples. This will avoid circular dependency.

@res-life
Copy link
Collaborator Author

For the spark-rapids-examples code, refer to NVIDIA/spark-rapids-examples#94

@res-life res-life marked this pull request as draft January 26, 2022 12:51
@res-life
Copy link
Collaborator Author

Convert to draft because of the build is not passed.

docs/additional-functionality/rapids-udfs.md Outdated Show resolved Hide resolved
implements a Hive simple UDF using
[native code](../../udf-examples/src/main/cpp/src) to count words in strings

in the [udf-examples](https://github.com/NVIDIA/spark-rapids-examples/tree/branch-22.04/examples/Spark-Rapids/udf-examples) project.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate there isn't a main branch in the spark-rapids-examples repository we can point to so we're always pointing to the latest stable release code. Otherwise we're going to forget about this and then point to possibly very stale examples, and the user would miss any new examples that were added in later versions of the spark-rapids-examples code. @GaryShen2008 should we be doing a merge to main on the spark-rapids-examples repository as we do for this one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can create a main branch for the latest release version.

@res-life
Copy link
Collaborator Author

res-life commented Jan 27, 2022

Updated the code for the rapids_build-it-UDF-native weekly Jenkins job.
The Jenkins pipeline uses the following command to test native UDF test cases.
So although the Java and Scala code no longer depend on the external udf-examples, but the rapids_udf_test.py and run_pyspark_from_build.sh depends on the external udf-examples when running the Jenkins job.

LOCAL_JAR_PATH=/path-to/local/jars \
bash ./integration_tests/run_pyspark_from_build.sh \
-m rapids_udf_example_native --rapids_udf_example_native

After this PR merged, then should update the Jenkins job acorrdingly. @pxLi

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

build

@res-life res-life marked this pull request as ready for review January 28, 2022 07:15
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want the premerge and nightly tests to also build and run the UDF example tests, are the scripts going to be updated by this PR or done separately? If separately, we should file an issue to track.

- [StringWordCount](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java)
implements a Hive simple UDF using
[native code](../../udf-examples/src/main/cpp/src) to count words in strings
Source code for examples of RAPIDS accelerated UDFs is provided

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra whitespace ends up disconnecting this sentence when it's rendered.

Suggested change

@res-life res-life self-assigned this Feb 7, 2022
@res-life
Copy link
Collaborator Author

res-life commented Feb 7, 2022

Filed an issue to track the premerge and nightly tests: #4704

@res-life
Copy link
Collaborator Author

res-life commented Feb 7, 2022

build

@res-life
Copy link
Collaborator Author

res-life commented Feb 7, 2022

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks OK, but it would be good to verify the nightly builds are ready for this before approving. cc: @pxLi

@res-life
Copy link
Collaborator Author

res-life commented Feb 8, 2022

build

@res-life res-life marked this pull request as draft February 8, 2022 03:43
@res-life
Copy link
Collaborator Author

res-life commented Feb 8, 2022

The nightly builds are not ready, convert to draft to prevent it from being accidentally merged.

@res-life res-life marked this pull request as ready for review February 21, 2022 01:32
@res-life
Copy link
Collaborator Author

Pipelines created by Tim are ready: Rapids-examples/examples-udf-examples-nightly, Rapids-examples/examples-udf-examples-native.
Pipeline issue is: #4704
@jlowe Please review

@jlowe
Copy link
Member

jlowe commented Feb 22, 2022

I mentioned in this comment that we shouldn't lose premerge testing against at least some external UDFs for this repository. Having separate nightly tests is fine, I don't care much either way as long as we get nightly coverage of the UDF examples working against updated snapshot versions of the RAPIDS Accelerator, but as it is now I think we're losing premerge tests against truly external UDFs.

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this for now since it looks like a bunch of nightly tests are failing because the udf example jars were removed. I still want to see premerge fixed to test truly external UDFs, but we can address that in a followup.

@jlowe
Copy link
Member

jlowe commented Feb 22, 2022

This hasn't been built in a while, so getting a fresh build to verify it's still ready to go.

@jlowe
Copy link
Member

jlowe commented Feb 22, 2022

build

@jlowe
Copy link
Member

jlowe commented Feb 22, 2022

Filed #4834 to track covering the external UDF tests in the premerge build

@res-life
Copy link
Collaborator Author

build

@res-life res-life merged commit 7d1662e into NVIDIA:branch-22.04 Feb 23, 2022
@res-life res-life deleted the move-udf-examples branch March 13, 2022 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] udf-examples dependencies are incorrect
4 participants