Skip to content

Commit

Permalink
Add option to link Java bindings with Arrow dynamically (#8307)
Browse files Browse the repository at this point in the history
Closes #8300

Adds a Maven property, `CUDF_JNI_ARROW_STATIC`, that can be used to specify the JNI bindings should be linked against the Arrow library dynamically.  This defaults to `OFF` since Arrow and its dependencies are not guaranteed to be available in the jar's environment, but this can be useful for cases where a developer has already built libcudf that dynamically links to Arrow and just wants to run Java tests within the same build environment without needing to rebuild libcudf with static Arrow.

For example, building and testing the Java bindings with dynamically-linked Arrow can be done via:
```
mvn clean test -DCUDF_JNI_ARROW_STATIC=OFF
```

Authors:
  - Jason Lowe (https://github.com/jlowe)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)

URL: #8307
  • Loading branch information
jlowe authored May 24, 2021
1 parent 90a244c commit 66d6328
Show file tree
Hide file tree
Showing 3 changed files with 46 additions and 19 deletions.
45 changes: 29 additions & 16 deletions java/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ unexpected behavior if you try to mix these libraries using the same thread.
## Dependency

This is a fat jar with the binary dependencies packaged in the jar. This means the jar will only
run on platforms the jar was compiled for. When this is in an official maven repository we will
run on platforms the jar was compiled for. When this is in an official Maven repository we will
list the platforms that it is compiled and tested for. In the mean time you will need to build it
yourself. In official releases there should be no classifier on the jar and it should run against
most modern cuda drivers.
Expand Down Expand Up @@ -50,36 +50,49 @@ CUDA 11.0:

## Build From Source

Build the native code first, and make sure the a JDK is installed and available.
Build [libcudf](../cpp) first, and make sure the JDK is installed and available. Specify
the cmake option `-DCUDF_USE_ARROW_STATIC=ON` when building so that Apache Arrow is linked
statically to libcudf, as this will help create a jar that does not require Arrow and its
dependencies to be available in the runtime environment.

Pass in the cmake option `-DCUDF_USE_ARROW_STATIC=ON` so that Apache Arrow is linked statically.
After building libcudf, the Java bindings can be built via Maven, e.g.:
```
mvn clean install
```

If you have a compatible GPU on your build system the tests will use it. If not you will see a
lot of skipped tests.

## Dynamically Linking Arrow

Since libcudf builds by default with a dynamically linked Arrow dependency, it may be
desirable to build the Java bindings without requiring a statically-linked Arrow to avoid
rebuilding an already built libcudf.so. To do so, specify the additional command-line flag
`-DCUDF_JNI_ARROW_STATIC=OFF` when building the Java bindings with Maven. However this will
result in a jar that requires the correct Arrow version to be available in the runtime
environment, and therefore is not recommended unless you are only performing local testing
within the libcudf build environment.

## Statically Linking the CUDA Runtime

If you use the default cmake options libcudart will be dynamically linked to libcudf
which is included. If you do this the resulting jar will have a classifier associated with it
because that jar can only be used with a single version of the CUDA runtime.

There is experimental work to try and remove that requirement but it is not fully functional
you can build cuDF with `-DCUDA_STATIC_RUNTIME=ON` when running cmake, and similarly
`-DCUDA_STATIC_RUNTIME=ON` when running maven. This will statically link in the CUDA runtime
`-DCUDA_STATIC_RUNTIME=ON` when running Maven. This will statically link in the CUDA runtime
and result in a jar with no classifier that should run on any host that has a version of the
driver new enough to support the runtime that this was built with.

To build with maven for dynamic linking you would run.

```
mvn clean install
```

for static linking you would run

To build the Java bindings with a statically-linked CUDA runtime, use a build command like:
```
mvn clean install -DCUDA_STATIC_RUNTIME=ON
```

You will get errors if you don't do it consistently. We tried to detect these up front and stop the build early if there is a mismatch, but there may be some cases we missed and this can result in some very hard to debug errors.

If you have a compatible GPU on your build system the tests will use it. If not you will see a
lot of skipped tests.
You will get errors if the CUDA runtime linking is not consistent. We tried to detect these
up front and stop the build early if there is a mismatch, but there may be some cases we missed
and this can result in some very hard to debug errors.

## Per-thread Default Stream

Expand Down
2 changes: 2 additions & 0 deletions java/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@
<RMM_LOGGING_LEVEL>INFO</RMM_LOGGING_LEVEL>
<USE_GDS>OFF</USE_GDS>
<GPU_ARCHS>ALL</GPU_ARCHS>
<CUDF_JNI_ARROW_STATIC>ON</CUDF_JNI_ARROW_STATIC>
<native.build.path>${project.build.directory}/cmake-build</native.build.path>
<slf4j.version>1.7.30</slf4j.version>
<arrow.version>0.15.1</arrow.version>
Expand Down Expand Up @@ -378,6 +379,7 @@
<arg value="-DCMAKE_EXPORT_COMPILE_COMMANDS=${CMAKE_EXPORT_COMPILE_COMMANDS}"/>
<arg value="-DCUDF_CPP_BUILD_DIR=${CUDF_CPP_BUILD_DIR}"/>
<arg value="-DGPU_ARCHS=${GPU_ARCHS}"/>
<arg value="-DCUDF_JNI_ARROW_STATIC=${CUDF_JNI_ARROW_STATIC}"/>
</exec>
<exec dir="${native.build.path}"
failonerror="true"
Expand Down
18 changes: 15 additions & 3 deletions java/src/main/native/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,14 @@ option(BUILD_TESTS "Configure CMake to build tests" ON)
option(PER_THREAD_DEFAULT_STREAM "Build with per-thread default stream" OFF)
option(CUDA_STATIC_RUNTIME "Statically link the CUDA runtime" OFF)
option(USE_GDS "Build with GPUDirect Storage (GDS)/cuFile support" OFF)
option(CUDF_JNI_ARROW_STATIC "Statically link Arrow" ON)

message(VERBOSE "CUDF_JNI: Build with NVTX support: ${USE_NVTX}")
message(VERBOSE "CUDF_JNI: Configure CMake to build tests: ${BUILD_TESTS}")
message(VERBOSE "CUDF_JNI: Build with per-thread default stream: ${PER_THREAD_DEFAULT_STREAM}")
message(VERBOSE "CUDF_JNI: Statically link the CUDA runtime: ${CUDA_STATIC_RUNTIME}")
message(VERBOSE "CUDF_JNI: Build with GPUDirect Storage support: ${USE_GDS}")
message(VERBOSE "CUDF_JNI: Build with static Arrow library: ${CUDF_JNI_ARROW_STATIC}")

set(CUDF_SOURCE_DIR "${PROJECT_SOURCE_DIR}/../../../../cpp")
set(CUDF_CPP_BUILD_DIR "${CUDF_SOURCE_DIR}/build")
Expand Down Expand Up @@ -166,14 +168,24 @@ find_path(ARROW_INCLUDE "arrow"

message(STATUS "ARROW: ARROW_INCLUDE set to ${ARROW_INCLUDE}")

# Find static version of Arrow lib
find_library(ARROW_LIBRARY libarrow.a
if(CUDF_JNI_ARROW_STATIC)
# Find static version of Arrow lib
set(CUDF_JNI_ARROW_LIBNAME "libarrow.a")
else()
set(CUDF_JNI_ARROW_LIBNAME "arrow")
endif(CUDF_JNI_ARROW_STATIC)

find_library(ARROW_LIBRARY ${CUDF_JNI_ARROW_LIBNAME} REQUIRED
HINTS "$ENV{ARROW_ROOT}/lib"
"$ENV{CONDA_PREFIX}/lib"
"${CUDF_CPP_BUILD_DIR}/_deps/arrow-build/release")

if(NOT ARROW_LIBRARY)
message(FATAL_ERROR "Arrow static libs not found. Was libcudf built with CUDF_USE_ARROW_STATIC=ON?")
if(CUDF_JNI_ARROW_STATIC)
message(FATAL_ERROR "Arrow static library not found. Was libcudf built with CUDF_USE_ARROW_STATIC=ON?")
else()
message(FATAL_ERROR "Arrow dynamic library not found.")
endif(CUDF_JNI_ARROW_STATIC)
else()
message(STATUS "ARROW: ARROW_LIBRARY set to ${ARROW_LIBRARY}")
endif(NOT ARROW_LIBRARY)
Expand Down

0 comments on commit 66d6328

Please sign in to comment.