Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gpu][rapids] Fix GPU init action on Ubuntu and update Spark RAPIDS to 22.04 #991

Merged
merged 12 commits into from
May 5, 2022

Conversation

viadea
Copy link
Contributor

@viadea viadea commented May 2, 2022

Update Spark RAPIDS 2204 script and the way to install gpu driver for ubuntu

viadea added 2 commits May 2, 2022 14:25
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
@medb
Copy link
Contributor

medb commented May 2, 2022

/gcbrun

@viadea
Copy link
Contributor Author

viadea commented May 2, 2022

@medb do u know the reason for the failure? seems we could not see the failure logs.

@medb medb changed the title Update Spark RAPIDS 2204 script and the way to install gpu driver for ubuntu [gpu][rapids] Fix GPU init action on Ubuntu and update Spark RAPIDS to 22.04 May 2, 2022
viadea added 2 commits May 2, 2022 21:41
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
@medb
Copy link
Contributor

medb commented May 3, 2022

Seems like another repo has issues with GPG key on Ubuntu too:

+ eval 'apt-get update'
++ apt-get update
Hit:1 http://us-central1.gce.archive.ubuntu.com/ubuntu bionic InRelease
Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu bionic-backports InRelease
Hit:4 http://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease
Hit:5 https://packages.cloud.google.com/apt google-cloud-logging-bionic-all InRelease
Hit:6 https://packages.cloud.google.com/apt google-cloud-monitoring-bionic-all InRelease
Hit:7 https://storage.googleapis.com/goog-dataproc-bigtop-repo-us-central1/1_5_ubu18_20220428_054457-RC01 dataproc InRelease
Hit:8 http://security.ubuntu.com/ubuntu bionic-security InRelease
Hit:9 https://download.docker.com/linux/ubuntu bionic InRelease
Hit:10 https://repo.mysql.com/apt/ubuntu bionic InRelease
Ign:11 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Get:12 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release [564 B]
Hit:13 https://storage.googleapis.com/dataproc-bigtop-repo/1_5_ubu18_20220428_054457-RC01 dataproc InRelease
Get:14 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release.gpg [833 B]
Hit:15 https://packages.adoptium.net/artifactory/deb bionic InRelease
Ign:14 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release.gpg
Reading package lists...
W: GPG error: http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY F60F4B3D7FA2AF80
E: The repository 'http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release' is not signed.

viadea added 2 commits May 3, 2022 09:41
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
@viadea
Copy link
Contributor Author

viadea commented May 3, 2022

@medb @mengdong
The nccl repo is still using old key while some other repo is using new key.
So I just:

  1. Switched the cuda repo from old key to new key.
  2. For nccl repo, i just added the old key as well.

Hope it can make it work this time.

@medb
Copy link
Contributor

medb commented May 3, 2022

Seems like there are still some issues with repos on Ubuntu:

+ bash ./driver.run --silent --install-libglvnd
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 460.73.01...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

WARNING: nvidia-installer was forced to guess the X library path '/usr/lib' and X module path '/usr/lib/xorg/modules'; these paths were not queryable from the system.  If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver.


WARNING: Unable to find a suitable destination to install 32-bit compatibility libraries. Your system may not be set up for 32-bit compatibility. 32-bit compatibility files will not be installed; if you wish to install them, re-run the installation and set a valid directory with the --compat32-libdir option.

+ curl -fsSL --retry-connrefused --retry 10 --retry-max-time 30 https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run -o cuda.run
+ bash ./cuda.run --silent --toolkit --no-opengl-libs
+ ldconfig
+ echo 'NVIDIA GPU driver provided by NVIDIA was installed successfully'
NVIDIA GPU driver provided by NVIDIA was installed successfully
+ [[ -n 8.1.1.33 ]]
+ install_nvidia_nccl
+ local -r nccl_version=2.8.3-1+cuda11.2
+ [[ ubuntu == rocky ]]
+ [[ ubuntu == ubuntu ]]
+ curl -fsSL --retry-connrefused --retry 10 --retry-max-time 30 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb/7fa2af80.pub
+ apt-key add -
Warning: apt-key output should not be parsed (stdout is not a terminal)
gpg: no valid OpenPGP data found.

Signed-off-by: Hao Zhu <hazhu@nvidia.com>
@viadea
Copy link
Contributor Author

viadea commented May 3, 2022

@medb Just fixed. Try again?

@medb
Copy link
Contributor

medb commented May 3, 2022

GPG key issue is fixed, but seems like there are some problems with package versions on Ubuntu:

++ apt-get install -y --no-install-recommends libcudnn8=8.1.1.33-1+cuda11.2 libcudnn8-dev=8.1.1.33-1+cuda11.2
Reading package lists...
Building dependency tree...
Reading state information...
E: Version '8.1.1.33-1+cuda11.2' for 'libcudnn8' was not found
E: Version '8.1.1.33-1+cuda11.2' for 'libcudnn8-dev' was not found

Signed-off-by: Hao Zhu <hazhu@nvidia.com>
@viadea
Copy link
Contributor Author

viadea commented May 4, 2022

@medb i just changed to the old way. hope this minimum change can at least make everything work.

@viadea
Copy link
Contributor Author

viadea commented May 4, 2022

@medb how is the latest test result?

@medb
Copy link
Contributor

medb commented May 5, 2022

There are still some Dask test failure, but it's not related to this change, so I will mage PR.

@medb medb merged commit 732809e into GoogleCloudDataproc:master May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants