Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mvn install udf-compiler instead of pulling from remote when building all #4453

Closed
wants to merge 1 commit into from

Conversation

res-life
Copy link
Collaborator

@res-life res-life commented Jan 4, 2022

This fixes #4442
Mvn install udf-compiler instead of pulling from remote when building all

Signed-off-by: Chong Gao res_life@163.com

… all

Signed-off-by: Chong Gao <res_life@163.com>
@res-life
Copy link
Collaborator Author

res-life commented Jan 4, 2022

Below command first finds aggregator jars in the target directory, if not find, then pulls from remote.
So for the first time, it will fail.
Details see "build-parallel-worlds.xml"

mvn -P noSnapshotsWithDatabricks clean install

Note for Databricks, it always pulls from remote.

Use this command to build after this fix

./build/buildall -p=noSnapshots -P=8

// After Databricks jars are uploaded to remote repository, please use this

./build/buildall -p=noSnapshotsWithDatabricks -P=8

For this fix, please see the comments inline.

@res-life
Copy link
Collaborator Author

res-life commented Jan 4, 2022

build

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about how the parallel build is intended to work to be able to say if this is really a fix or not.

[[ "$LOG_FILE" != "/dev/tty" ]] && tail -20 "$LOG_FILE" || true
exit 255
}

echo "#### REDIRECTING mvn output to $LOG_FILE ####"
mvn -U "$MVN_PHASE" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this command append to the log file instead of over-write it?

# If not, the first build all will fail because of remote repository does not have udf-compiler jar.
# And should not pull from remote to avoid the inconsistent code of different sub-modules.
# This will compile and install the corresponded shim dependency also.
mvn -U "install" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a larger underlying issue here. This is probably going to be fixed by a separate build, but isn't the UDF compiler being built against different versions of Spark too? But there is no distinction between these versions in the JAR name. So then when we build it in parallel like this is it not just a race to see which one finishes when to see what version is included in the jar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the UDF compiler is built against different versions of Spark.
The jars are generated in the different path, like target/spark3xx/ , so it's not race.

This PR is invalid, see the comments below, will close it.

@tgravescs
Copy link
Collaborator

tgravescs commented Jan 4, 2022

yeah the question here is why didn't udf-compiler module get built as a dependency of the aggregator?

This PR is invalid.
Seems I missed some places when replacing Snapshot vertions with release versions, then did not compile udf-compiler when building aggregator.
Tim have built successfully with command:

./build/buildall -p=noSnapshots -P=8

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Jan 5, 2022

Local build PASS:

./build/buildall -p=noSnapshots -P=8

[INFO] --- maven-jar-plugin:3.2.0:test-jar (default-test-jar) @ rapids-4-spark-tools_2.12 ---
[INFO] Building jar: /spark-rapids/tools/target/spark311/rapids-4-spark-tools_2.12-22.02.0-spark311tests.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for RAPIDS Accelerator for Apache Spark Distribution 22.02.0:
[INFO]
[INFO] RAPIDS Accelerator for Apache Spark Distribution ... SUCCESS [02:56 min]
[INFO] RAPIDS Accelerator for Apache Spark UDF Examples ... SUCCESS [  5.548 s]
[INFO] rapids-4-spark-integration-tests_2.12 .............. SUCCESS [01:00 min]
[INFO] RAPIDS Accelerator for Apache Spark Tests .......... SUCCESS [ 25.149 s]
[INFO] rapids-4-spark-api-validation ...................... SUCCESS [  3.567 s]
[INFO] RAPIDS Accelerator for Apache Spark tools .......... SUCCESS [ 57.777 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  05:29 min
[INFO] Finished at: 2022-01-05T04:43:18Z
[INFO] ------------------------------------------------------------------------

real    23m4.112s
user    82m19.052s
sys     1m26.713s

@res-life
Copy link
Collaborator Author

res-life commented Jan 5, 2022

The corresponding issue is closed.
This PR is invalid.

@res-life res-life closed this Jan 5, 2022
@res-life res-life deleted the fix-build-all-failed branch January 5, 2022 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] mvn build FAILED with option -P noSnapshotsWithDatabricks
4 participants