Skip to content

Commit

Permalink
Upgrade to UCX 1.12.1 for 22.06 (#5141)
Browse files Browse the repository at this point in the history
* Upgrade to UCX 1.12.1 for 22.06

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

* Update UCX to 1.12.1 in blossom dockerfile

* Update copyrights. Remove extra ARG declarations that are not needed

* Bringing the extra ARGs back just for clarity about what are the arguments
  • Loading branch information
abellina authored Apr 6, 2022
1 parent 81dbb75 commit 4c8ec71
Show file tree
Hide file tree
Showing 7 changed files with 46 additions and 49 deletions.
51 changes: 22 additions & 29 deletions docs/additional-functionality/rapids-shuffle.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ be installed on the host and inside Docker containers (if not baremetal). A host
requirements, like the MLNX_OFED driver and `nv_peer_mem` kernel module.

The minimum UCX requirement for the RAPIDS Shuffle Manager is
[UCX 1.11.2](https://github.com/openucx/ucx/releases/tag/v1.11.2).
[UCX 1.12.1](https://github.com/openucx/ucx/releases/tag/v1.12.1).

#### Baremetal

Expand Down Expand Up @@ -73,47 +73,40 @@ The minimum UCX requirement for the RAPIDS Shuffle Manager is
further.

2. Fetch and install the UCX package for your OS from:
[UCX 1.11.2](https://github.com/openucx/ucx/releases/tag/v1.11.2).

NOTE: Please install the artifact with the newest CUDA 11.x version (for UCX 1.11.2 please
pick CUDA 11.2) as CUDA 11 introduced [CUDA Enhanced Compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#enhanced-compat-minor-releases).
Starting with UCX 1.12, UCX will stop publishing individual artifacts for each minor version of CUDA.

Please refer to our [FAQ](../FAQ.md#what-hardware-is-supported) for caveats with
CUDA Enhanced Compatibility.
[UCX 1.12.1](https://github.com/openucx/ucx/releases/tag/v1.12.1).

RDMA packages have extra requirements that should be satisfied by MLNX_OFED.

##### CentOS UCX RPM
The UCX packages for CentOS 7 and 8 are divided into different RPMs. For example, UCX 1.11.2
The UCX packages for CentOS 7 and 8 are divided into different RPMs. For example, UCX 1.12.1
available at
https://github.com/openucx/ucx/releases/download/v1.11.2/ucx-v1.11.2-centos7-mofed5.x-cuda11.2.tar.bz2
https://github.com/openucx/ucx/releases/download/v1.12.1/ucx-v1.12.1-centos7-mofed5-cuda11.tar.bz2
contains:

```
ucx-devel-1.11.2-1.el7.x86_64.rpm
ucx-debuginfo-1.11.2-1.el7.x86_64.rpm
ucx-1.11.2-1.el7.x86_64.rpm
ucx-cuda-1.11.2-1.el7.x86_64.rpm
ucx-rdmacm-1.11.2-1.el7.x86_64.rpm
ucx-cma-1.11.2-1.el7.x86_64.rpm
ucx-ib-1.11.2-1.el7.x86_64.rpm
ucx-devel-1.12.1-1.el7.x86_64.rpm
ucx-debuginfo-1.12.1-1.el7.x86_64.rpm
ucx-1.12.1-1.el7.x86_64.rpm
ucx-cuda-1.12.1-1.el7.x86_64.rpm
ucx-rdmacm-1.12.1-1.el7.x86_64.rpm
ucx-cma-1.12.1-1.el7.x86_64.rpm
ucx-ib-1.12.1-1.el7.x86_64.rpm
```

For a setup without RoCE or Infiniband networking, the only packages required are:

```
ucx-1.11.2-1.el7.x86_64.rpm
ucx-cuda-1.11.2-1.el7.x86_64.rpm
ucx-1.12.1-1.el7.x86_64.rpm
ucx-cuda-1.12.1-1.el7.x86_64.rpm
```

If accelerated networking is available, the package list is:

```
ucx-1.11.2-1.el7.x86_64.rpm
ucx-cuda-1.11.2-1.el7.x86_64.rpm
ucx-rdmacm-1.11.2-1.el7.x86_64.rpm
ucx-ib-1.11.2-1.el7.x86_64.rpm
ucx-1.12.1-1.el7.x86_64.rpm
ucx-cuda-1.12.1-1.el7.x86_64.rpm
ucx-rdmacm-1.12.1-1.el7.x86_64.rpm
ucx-ib-1.12.1-1.el7.x86_64.rpm
```

---
Expand Down Expand Up @@ -152,7 +145,7 @@ system if you have RDMA capable hardware.
Within the Docker container we need to install UCX and its requirements. These are Dockerfile
examples for Ubuntu 18.04:

The following are examples of Docker containers with UCX 1.11.2 and cuda-11.2 support.
The following are examples of Docker containers with UCX 1.12.1 and cuda-11.2 support.

| OS Type | RDMA | Dockerfile |
| ------- | ---- | ---------- |
Expand Down Expand Up @@ -296,7 +289,7 @@ In this section, we are using a docker container built using the sample dockerfi
| Databricks 9.1 | com.nvidia.spark.rapids.spark312db.RapidsShuffleManager |
| Databricks 10.4 | com.nvidia.spark.rapids.spark321db.RapidsShuffleManager |

2. Settings for UCX 1.11.2+:
2. Settings for UCX 1.12.1+:

Minimum configuration:

Expand Down Expand Up @@ -345,9 +338,9 @@ guide for Databricks. The following are extra steps required to enable UCX.
```
#!/bin/bash
sudo apt install -y wget libnuma1 &&
wget https://github.com/openucx/ucx/releases/download/v1.11.2/ucx-v1.11.2-ubuntu18.04-mofed5.x-cuda11.2.deb &&
sudo dpkg -i ucx-v1.11.2-ubuntu18.04-mofed5.x-cuda11.2.deb &&
rm ucx-v1.11.2-ubuntu18.04-mofed5.x-cuda11.2.deb
wget https://github.com/openucx/ucx/releases/download/v1.12.1/ucx-v1.12.1-ubuntu18.04-mofed5-cuda11.deb &&
sudo dpkg -i ucx-v1.12.1-ubuntu18.04-mofed5-cuda11.deb &&
rm ucx-v1.12.1-ubuntu18.04-mofed5-cuda11.deb
```

Save the script in DBFS and add it to the "Init Scripts" list:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -22,15 +22,15 @@
# See: https://github.com/openucx/ucx/releases/

ARG CUDA_VER=11.2.2
ARG UCX_VER=1.11.2
ARG UCX_CUDA_VER=11.2
ARG UCX_VER=1.12.1
ARG UCX_CUDA_VER=11

FROM nvidia/cuda:${CUDA_VER}-runtime-centos7
ARG UCX_VER
ARG UCX_CUDA_VER

RUN yum update -y && yum install -y wget bzip2
RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-centos7-mofed5.x-cuda$UCX_CUDA_VER.tar.bz2
RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-centos7-mofed5-cuda$UCX_CUDA_VER.tar.bz2
RUN cd /tmp && tar -xvf *.bz2 && \
yum install -y ucx-$UCX_VER-1.el7.x86_64.rpm && \
yum install -y ucx-cuda-$UCX_VER-1.el7.x86_64.rpm && \
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -29,8 +29,8 @@

ARG RDMA_CORE_VERSION=32.1
ARG CUDA_VER=11.2.2
ARG UCX_VER=1.11.2
ARG UCX_CUDA_VER=11.2
ARG UCX_VER=1.12.1
ARG UCX_CUDA_VER=11

# Throw away image to build rdma_core
FROM centos:7 as rdma_core
Expand Down Expand Up @@ -59,7 +59,7 @@ COPY --from=rdma_core /tmp/*.rpm /tmp/

RUN yum update -y
RUN yum install -y wget bzip2
RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-centos7-mofed5.x-cuda$UCX_CUDA_VER.tar.bz2
RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-centos7-mofed5-cuda$UCX_CUDA_VER.tar.bz2
RUN cd /tmp && \
yum install -y *.rpm && \
tar -xvf *.bz2 && \
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -22,14 +22,14 @@
# See: https://github.com/openucx/ucx/releases/

ARG CUDA_VER=11.2.2
ARG UCX_VER=1.11.2
ARG UCX_CUDA_VER=11.2
ARG UCX_VER=1.12.1
ARG UCX_CUDA_VER=11

FROM nvidia/cuda:${CUDA_VER}-runtime-ubuntu18.04
ARG UCX_VER
ARG UCX_CUDA_VER

RUN apt update
RUN apt-get install -y wget
RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-ubuntu18.04-mofed5.x-cuda$UCX_CUDA_VER.deb
RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-ubuntu18.04-mofed5-cuda$UCX_CUDA_VER.deb
RUN apt install -y /tmp/*.deb && rm -rf /tmp/*.deb
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -29,8 +29,8 @@

ARG RDMA_CORE_VERSION=32.1
ARG CUDA_VER=11.2.2
ARG UCX_VER=1.11.2
ARG UCX_CUDA_VER=11.2
ARG UCX_VER=1.12.1
ARG UCX_CUDA_VER=11

# Throw away image to build rdma_core
FROM ubuntu:18.04 as rdma_core
Expand All @@ -50,5 +50,5 @@ COPY --from=rdma_core /*.deb /tmp/

RUN apt update
RUN apt-get install -y wget
RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-ubuntu18.04-mofed5.x-cuda$UCX_CUDA_VER.deb
RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-ubuntu18.04-mofed5-cuda$UCX_CUDA_VER.deb
RUN apt install -y /tmp/*.deb && rm -rf /tmp/*.deb
10 changes: 7 additions & 3 deletions jenkins/Dockerfile-blossom.ubuntu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2020-2021, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2020-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -21,15 +21,19 @@
# Arguments:
# CUDA_VER=11.0+
# UBUNTU_VER=18.04 or 20.04
# UCX_CUDA_VER=11 (major CUDA version)
# UCX_VER=1.12.1
###

ARG CUDA_VER=11.0
ARG UBUNTU_VER=18.04
ARG UCX_VER=1.11.2
ARG UCX_VER=1.12.1
ARG UCX_CUDA_VER=11
FROM nvidia/cuda:${CUDA_VER}-runtime-ubuntu${UBUNTU_VER}
ARG CUDA_VER
ARG UBUNTU_VER
ARG UCX_VER
ARG UCX_CUDA_VER

# Install jdk-8, jdk-11, maven, docker image
RUN apt-get update -y && \
Expand All @@ -53,7 +57,7 @@ RUN apt install -y inetutils-ping expect wget libnuma1 libgomp1

RUN mkdir -p /tmp/ucx && \
cd /tmp/ucx && \
wget https://github.com/openucx/ucx/releases/download/v${UCX_VER}/ucx-v${UCX_VER}-ubuntu${UBUNTU_VER}-mofed5.x-cuda${CUDA_VER}.deb && \
wget https://github.com/openucx/ucx/releases/download/v${UCX_VER}/ucx-v${UCX_VER}-ubuntu${UBUNTU_VER}-mofed5-cuda${UCX_CUDA_VER}.deb && \
dpkg -i *.deb && \
rm -rf /tmp/ucx

Expand Down
2 changes: 1 addition & 1 deletion shuffle-plugin/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
<dependency>
<groupId>org.openucx</groupId>
<artifactId>jucx</artifactId>
<version>1.11</version>
<version>1.12.1</version>
<scope>compile</scope>
</dependency>
<dependency>
Expand Down

0 comments on commit 4c8ec71

Please sign in to comment.