Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to link Java bindings with Arrow dynamically [skip ci] #8307

Merged
merged 2 commits into from
May 24, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 29 additions & 16 deletions java/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ unexpected behavior if you try to mix these libraries using the same thread.
## Dependency

This is a fat jar with the binary dependencies packaged in the jar. This means the jar will only
run on platforms the jar was compiled for. When this is in an official maven repository we will
run on platforms the jar was compiled for. When this is in an official Maven repository we will
list the platforms that it is compiled and tested for. In the mean time you will need to build it
yourself. In official releases there should be no classifier on the jar and it should run against
most modern cuda drivers.
Expand Down Expand Up @@ -50,36 +50,49 @@ CUDA 11.0:

## Build From Source

Build the native code first, and make sure the a JDK is installed and available.
Build [libcudf](../cpp) first, and make sure the JDK is installed and available. Specify
the cmake option `-DCUDF_USE_ARROW_STATIC=ON` when building so that Apache Arrow is linked
statically to libcudf, as this will help create a jar that does not require Arrow and its
dependencies to be available in the runtime environment.

Pass in the cmake option `-DCUDF_USE_ARROW_STATIC=ON` so that Apache Arrow is linked statically.
After building libcudf, the Java bindings can be built via Maven, e.g.:
```
mvn clean install
```

If you have a compatible GPU on your build system the tests will use it. If not you will see a
lot of skipped tests.

## Dynamically Linking Arrow

Since libcudf builds by default with a dynamically linked Arrow dependency, it may be
desirable to build the Java bindings without requiring a statically-linked Arrow to avoid
rebuilding an already built libcudf.so. To do so, specify the additional command-line flag
`-DCUDF_JNI_ARROW_STATIC=OFF` when building the Java bindings with Maven. However this will
result in a jar that requires the correct Arrow version to be available in the runtime
environment, and therefore is not recommended unless you are only performing local testing
within the libcudf build environment.

## Statically Linking the CUDA Runtime

If you use the default cmake options libcudart will be dynamically linked to libcudf
which is included. If you do this the resulting jar will have a classifier associated with it
because that jar can only be used with a single version of the CUDA runtime.

There is experimental work to try and remove that requirement but it is not fully functional
you can build cuDF with `-DCUDA_STATIC_RUNTIME=ON` when running cmake, and similarly
`-DCUDA_STATIC_RUNTIME=ON` when running maven. This will statically link in the CUDA runtime
`-DCUDA_STATIC_RUNTIME=ON` when running Maven. This will statically link in the CUDA runtime
and result in a jar with no classifier that should run on any host that has a version of the
driver new enough to support the runtime that this was built with.

To build with maven for dynamic linking you would run.

```
mvn clean install
```

for static linking you would run

To build the Java bindings with a statically-linked CUDA runtime, use a build command like:
```
mvn clean install -DCUDA_STATIC_RUNTIME=ON
```

You will get errors if you don't do it consistently. We tried to detect these up front and stop the build early if there is a mismatch, but there may be some cases we missed and this can result in some very hard to debug errors.

If you have a compatible GPU on your build system the tests will use it. If not you will see a
lot of skipped tests.
You will get errors if the CUDA runtime linking is not consistent. We tried to detect these
up front and stop the build early if there is a mismatch, but there may be some cases we missed
and this can result in some very hard to debug errors.

## Per-thread Default Stream

Expand Down
2 changes: 2 additions & 0 deletions java/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@
<RMM_LOGGING_LEVEL>INFO</RMM_LOGGING_LEVEL>
<USE_GDS>OFF</USE_GDS>
<GPU_ARCHS>ALL</GPU_ARCHS>
<CUDF_JNI_ARROW_STATIC>ON</CUDF_JNI_ARROW_STATIC>
revans2 marked this conversation as resolved.
Show resolved Hide resolved
<native.build.path>${project.build.directory}/cmake-build</native.build.path>
<slf4j.version>1.7.30</slf4j.version>
<arrow.version>0.15.1</arrow.version>
Expand Down Expand Up @@ -378,6 +379,7 @@
<arg value="-DCMAKE_EXPORT_COMPILE_COMMANDS=${CMAKE_EXPORT_COMPILE_COMMANDS}"/>
<arg value="-DCUDF_CPP_BUILD_DIR=${CUDF_CPP_BUILD_DIR}"/>
<arg value="-DGPU_ARCHS=${GPU_ARCHS}"/>
<arg value="-DCUDF_JNI_ARROW_STATIC=${CUDF_JNI_ARROW_STATIC}"/>
</exec>
<exec dir="${native.build.path}"
failonerror="true"
Expand Down
18 changes: 15 additions & 3 deletions java/src/main/native/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,14 @@ option(BUILD_TESTS "Configure CMake to build tests" ON)
option(PER_THREAD_DEFAULT_STREAM "Build with per-thread default stream" OFF)
option(CUDA_STATIC_RUNTIME "Statically link the CUDA runtime" OFF)
option(USE_GDS "Build with GPUDirect Storage (GDS)/cuFile support" OFF)
option(CUDF_JNI_ARROW_STATIC "Statically link Arrow" ON)

message(VERBOSE "CUDF_JNI: Build with NVTX support: ${USE_NVTX}")
message(VERBOSE "CUDF_JNI: Configure CMake to build tests: ${BUILD_TESTS}")
message(VERBOSE "CUDF_JNI: Build with per-thread default stream: ${PER_THREAD_DEFAULT_STREAM}")
message(VERBOSE "CUDF_JNI: Statically link the CUDA runtime: ${CUDA_STATIC_RUNTIME}")
message(VERBOSE "CUDF_JNI: Build with GPUDirect Storage support: ${USE_GDS}")
message(VERBOSE "CUDF_JNI: Build with static Arrow library: ${CUDF_JNI_ARROW_STATIC}")

set(CUDF_SOURCE_DIR "${PROJECT_SOURCE_DIR}/../../../../cpp")
set(CUDF_CPP_BUILD_DIR "${CUDF_SOURCE_DIR}/build")
Expand Down Expand Up @@ -166,14 +168,24 @@ find_path(ARROW_INCLUDE "arrow"

message(STATUS "ARROW: ARROW_INCLUDE set to ${ARROW_INCLUDE}")

# Find static version of Arrow lib
find_library(ARROW_LIBRARY libarrow.a
if(CUDF_JNI_ARROW_STATIC)
# Find static version of Arrow lib
set(CUDF_JNI_ARROW_LIBNAME "libarrow.a")
else()
set(CUDF_JNI_ARROW_LIBNAME "arrow")
endif(CUDF_JNI_ARROW_STATIC)

find_library(ARROW_LIBRARY ${CUDF_JNI_ARROW_LIBNAME} REQUIRED
HINTS "$ENV{ARROW_ROOT}/lib"
"$ENV{CONDA_PREFIX}/lib"
"${CUDF_CPP_BUILD_DIR}/_deps/arrow-build/release")

if(NOT ARROW_LIBRARY)
message(FATAL_ERROR "Arrow static libs not found. Was libcudf built with CUDF_USE_ARROW_STATIC=ON?")
if(CUDF_JNI_ARROW_STATIC)
message(FATAL_ERROR "Arrow static library not found. Was libcudf built with CUDF_USE_ARROW_STATIC=ON?")
else()
message(FATAL_ERROR "Arrow dynamic library not found.")
endif(CUDF_JNI_ARROW_STATIC)
else()
message(STATUS "ARROW: ARROW_LIBRARY set to ${ARROW_LIBRARY}")
endif(NOT ARROW_LIBRARY)
Expand Down