Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU build broken with CUDA SDK 12.0 #13932

Closed
tufei opened this issue Dec 10, 2022 · 35 comments
Closed

GPU build broken with CUDA SDK 12.0 #13932

tufei opened this issue Dec 10, 2022 · 35 comments
Labels
ep:CUDA issues related to the CUDA execution provider

Comments

@tufei
Copy link

tufei commented Dec 10, 2022

Describe the issue

It seems the ORT has a hard dependency on CUDA SDK 11.x?

[dnn_onnxruntime @ 0x3f7ccc0] SessionOptionsAppendExecutionProvider_CUDA(): /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1069 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

To reproduce

On Fedora 36 update to latest CUDA SDK 12.0 then try some examples.

Urgency

No response

Platform

Linux

OS Version

Fedora 36

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.13.1

ONNX Runtime API

C

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.0

@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Dec 10, 2022
@snnn
Copy link
Member

snnn commented Dec 12, 2022

If your system doesn't have libcublasLt.so.11, then the answer is : "yes".

@tufei
Copy link
Author

tufei commented Dec 13, 2022

Thanks, I meant to say, is it possible that in your future releases, not link to libcublasLt.so.11, but only libcublasLt.so, and then check versions using proper API calls?

Regards,

@snnn
Copy link
Member

snnn commented Dec 13, 2022

@tufei, would you mind showing me an example? We didn't explicitly put the name of "libcublasLt.so.11" in the link command. We put "-lcublasLt" there and linker resolved it to "libcublasLt.so.11". Most Linux shared libs work in such a way. I don't know how to change it.

@smartnet-club
Copy link

terminate called after throwing an instance of 'Ort::Exception'
what(): /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1069 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

ldd libonnxruntime_providers_cuda.so
libcublasLt.so.11 => not found
libcublas.so.11 => not found

@borisfom
Copy link
Contributor

@tufei : You should be able to install CUDA 11 libraries alongside with your CUDA 12 to work around.
@snnn : it would be nice to have a separate onnxruntime-gpu wheel built with CUDA 12 available. Is that in your nearest plans ?

@dzhao
Copy link

dzhao commented Feb 11, 2023

coask when onnx can support cuda12? or even support building CUDA 12?

@zeruniverse
Copy link

I think #14659 needs to be merged into latest release for CUDA 12 build fix

@chriskyndrid
Copy link

+1

For the time being, I resolved this as follows:

  1. Add the fc35 cuda repo:
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora35/x86_64/cuda-fedora35.repo
sudo dnf clean all

If you're using the RPM fusion repositories for your display drivers:

sudo dnf module disable nvidia-driver 
  1. Install CUDA 11.8
sudo dnf install cuda-11-8
  1. cuDNN
    You may/will need to make sure you have the appropriate libcudnn8 installed for the version of cuda you are using:
sudo dnf install https://developer.download.nvidia.com/compute/machine-learning/repos/rhel8/x86_64/nvidia-machine-learning-repo-rhel8-1.0.0-1.x86_64.rpm

and e.g.

sudo dnf install libcudnn8 libcudnn8-devel libnccl libnccl-devel

You can browse the packages from the rhel8 repo here

After the above I can successfully run inference. I'm using the ort crate for rust with models I converted from pytorch(mostly) to ONNX.

@snnn
Copy link
Member

snnn commented Apr 5, 2023

@snnn : it would be nice to have a separate onnxruntime-gpu wheel built with CUDA 12 available. Is that in your nearest plans ?

Right now each our package only works with a specific CUDA minor version. For example, the last one only works with CUDA 11.6 and the next one will only work with CUDA 11.8. At some point it will become CUDA 12 point something. If you have more questions about the project's future plan, you can ask @pranavsharma .

@AkshayUpadhye
Copy link

I also had a similar issue, building onnxruntime from source helped !!

@snnn
Copy link
Member

snnn commented Aug 20, 2023

The last code should works fine on Windows with CUDA 12.2. I am adding a build pipeline for it. #17231

@wongwenxin
Copy link

I had a similar problem when using Clion to compile the onnxruntime-cpp example in yolov8!!

[DCSP_ONNX]:/onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1131 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

But I can find libcublasLt.so.11 under /usr/local/cuda11.8/lib64

Please help!!!

@snnn
Copy link
Member

snnn commented Sep 7, 2023

@wongwenxin See https://man7.org/linux/man-pages/man8/ld.so.8.html about how Linux finds dynamic libraries. You might need to run ldconfig to add the directory to the operating system's database, or setup LD_LIBRARY_PATH env.

I will close this issue now because it is as designed. All our prebuilt packages were built with CUDA 11.x. They are not compatible with CUDA 12.x. However, you can build ONNX Runtime from source with CUDA 12.x if you need to use that version of CUDA.

Feel free to open a new issue if you hit any build error with that.

@snnn snnn closed this as completed Sep 7, 2023
@jrabek
Copy link

jrabek commented Dec 12, 2023

Has anyone built the onnx runtime with cuda 12.x successfully? What would be the best instructions to use to do so?

@snnn
Copy link
Member

snnn commented Dec 12, 2023

The latest code should work fine with CUDA 12.2. And we have a nightly package for it. https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-12-nightly/PyPI/ort-nightly-gpu

@jrabek
Copy link

jrabek commented Dec 12, 2023

Thank you for the comment and link @snnn! Much appreciated 🙏

@FrancescoSaverioZuppichini

The latest code should work fine with CUDA 12.2. And we have a nightly package for it. https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-12-nightly/PyPI/ort-nightly-gpu

having this error

Screenshot 2023-12-24 at 16 25 48

thank you so much

@FrancescoSaverioZuppichini

Tried with


pip install ort-nightly-gpu

working!

@vladoossss
Copy link

vladoossss commented Dec 25, 2023

The latest code should work fine with CUDA 12.2. And we have a nightly package for it. https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-12-nightly/PyPI/ort-nightly-gpu

Tried this:

pip install ort-nightly-gpu==1.17.0.dev20231205004 --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/

But saw this error:

ERROR: Could not find a version that satisfies the requirement ort-nightly-gpu==1.17.0.dev20231205004 (from versions: none)
ERROR: No matching distribution found for ort-nightly-gpu==1.17.0.dev20231205004

@FrancescoSaverioZuppichini

The latest code should work fine with CUDA 12.2. And we have a nightly package for it. https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-12-nightly/PyPI/ort-nightly-gpu

Tried this:

pip install ort-nightly-gpu==1.17.0.dev20231205004 --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/

But saw this error:

ERROR: Could not find a version that satisfies the requirement ort-nightly-gpu==1.17.0.dev20231205004 (from versions: none) ERROR: No matching distribution found for ort-nightly-gpu==1.17.0.dev20231205004

look at my message above, try with

pip install ort-nightly-gpu

@vladoossss
Copy link

The latest code should work fine with CUDA 12.2. And we have a nightly package for it. https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-12-nightly/PyPI/ort-nightly-gpu

Tried this:

pip install ort-nightly-gpu==1.17.0.dev20231205004 --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/

But saw this error:
ERROR: Could not find a version that satisfies the requirement ort-nightly-gpu==1.17.0.dev20231205004 (from versions: none) ERROR: No matching distribution found for ort-nightly-gpu==1.17.0.dev20231205004

look at my message above, try with

pip install ort-nightly-gpu

With this command you installed ort-nightly-gpu==1.15.dev
But this version will not work with CUDA 12.

@FrancescoSaverioZuppichini

The latest code should work fine with CUDA 12.2. And we have a nightly package for it. https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-12-nightly/PyPI/ort-nightly-gpu

Tried this:

pip install ort-nightly-gpu==1.17.0.dev20231205004 --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/

But saw this error:
ERROR: Could not find a version that satisfies the requirement ort-nightly-gpu==1.17.0.dev20231205004 (from versions: none) ERROR: No matching distribution found for ort-nightly-gpu==1.17.0.dev20231205004

look at my message above, try with

pip install ort-nightly-gpu

With this command you installed ort-nightly-gpu==1.15.dev But this version will not work with CUDA 12.

I have CUDA 12 and it works 🤔

@RoM4iK
Copy link

RoM4iK commented Dec 27, 2023

For people who will come after me, you can download whl here
https://dev.azure.com/onnxruntime/onnxruntime/_artifacts/feed/onnxruntime-cuda-12/PyPI/onnxruntime-gpu/overview/1.17.0

@arcayi
Copy link

arcayi commented Jan 17, 2024

The latest code should work fine with CUDA 12.2. And we have a nightly package for it. https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-12-nightly/PyPI/ort-nightly-gpu

Tried this:

pip install ort-nightly-gpu==1.17.0.dev20231205004 --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/

But saw this error:
ERROR: Could not find a version that satisfies the requirement ort-nightly-gpu==1.17.0.dev20231205004 (from versions: none) ERROR: No matching distribution found for ort-nightly-gpu==1.17.0.dev20231205004

look at my message above, try with

pip install ort-nightly-gpu

With this command you installed ort-nightly-gpu==1.15.dev But this version will not work with CUDA 12.

I have CUDA 12 and it works 🤔

this works. thanks

@ZhangHangjianMA
Copy link

ZhangHangjianMA commented Apr 23, 2024

The latest code should work fine with CUDA 12.2. And we have a nightly package for it. https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-12-nightly/PyPI/ort-nightly-gpu

Tried this:

pip install ort-nightly-gpu==1.17.0.dev20231205004 --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/

But saw this error:
ERROR: Could not find a version that satisfies the requirement ort-nightly-gpu==1.17.0.dev20231205004 (from versions: none) ERROR: No matching distribution found for ort-nightly-gpu==1.17.0.dev20231205004

look at my message above, try with

pip install ort-nightly-gpu

With this command you installed ort-nightly-gpu==1.15.dev But this version will not work with CUDA 12.

I have CUDA 12 and it works 🤔

this works. thanks

I follow the solution: Fannovel16/comfyui_controlnet_aux#75 (comment),

pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/
pip install onnxruntime-gpu==1.17.0 --index-url=https://pkgs.dev.azure.com/onnxruntime/onnxruntime/_packaging/onnxruntime-cuda-12/pypi/simple/

It works for me, but with a warning:
onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.

@xenova
Copy link

xenova commented May 24, 2024

Has anyone got this working with onnxruntime-gpu==1.18.0? (https://pypi.org/project/onnxruntime-gpu/1.18.0/)

@meikuam
Copy link

meikuam commented Jun 18, 2024

Has anyone got this working with onnxruntime-gpu==1.18.0? (https://pypi.org/project/onnxruntime-gpu/1.18.0/)

I m facing same issue too. Seems like its not fixed at newer version of package.

@geraldstanje
Copy link

geraldstanje commented Jun 20, 2024

i see the same error:

root@fdd2e200ddd1:/workspace# find / -name "libcublasLt.so.11"
root@fdd2e200ddd1:/workspace# find / -name "libcublasLt.so"
/usr/local/cuda-12.3/targets/x86_64-linux/lib/stubs/libcublasLt.so
/usr/local/cuda-12.3/targets/x86_64-linux/lib/libcublasLt.so

find / -name "libonnxruntime_providers_cuda.so"
/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/libonnxruntime_providers_cuda.so
/opt/tritonserver/backends/onnxruntime/libonnxruntime_providers_cuda.so

2024-06-20 16:01:03.906843992 [E:onnxruntime:Default, provider_bridge_ort.cc:1744 TryGetProviderInfo_CUDA] 
/onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& 
onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library 
libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

more infos:
onnx/sklearn-onnx#1111

cc @pranavsharma - can we open a new issue - or is there a solution to this?

@snnn
Copy link
Member

snnn commented Jun 20, 2024

Did you get the package from https://pkgs.dev.azure.com/onnxruntime/onnxruntime/_packaging/onnxruntime-cuda-12/pypi/simple/ ?

@geraldstanje
Copy link

geraldstanje commented Jun 20, 2024

@snnn thanks for quick reply! no from: https://pypi.org/project/onnxruntime-gpu/
i could try the following - than it should be fixed?

pip install onnxruntime-gpu==1.18.0 --index-url=https://pkgs.dev.azure.com/onnxruntime/onnxruntime/_packaging/onnxruntime-cuda-12/pypi/simple/

also what is ort-nightly-gpu?

@snnn there is no 1.18.0 version?

@snnn
Copy link
Member

snnn commented Jun 20, 2024

Sorry I gave you the wrong URL. The URL should be

https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/

@geraldstanje
Copy link

Sorry I gave you the wrong URL. The URL should be

https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/

whats the diff between the 2 different URLs? does new URL have onnxruntime-gpu==1.18.0?

also what is ort-nightly-gpu?

@meikuam
Copy link

meikuam commented Jun 26, 2024

Sorry I gave you the wrong URL. The URL should be

https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/

Thanks that helped to fix this problem for now

@snnn
Copy link
Member

snnn commented Jun 29, 2024

https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-cuda-12 is the place where we host our CUDA 12 python, nuget and maven packages. You can click the "Connect to feed" button to see instructions.

https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-12-nightly is similar to above, but it only hosts nightly packages, and currently it doesn't have nightly packages for python. (We are working on it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider
Projects
None yet
Development

No branches or pull requests