Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Find libnvidia-ml.so.1 When Using "docker compose linux-gpu up" #95

Open
medined opened this issue Nov 20, 2023 · 4 comments
Open

Comments

@medined
Copy link

medined commented Nov 20, 2023

Here is the result of my command. Is this error inside the container or outside? The weird part to me is:

genai-stack-pull-model-1 | pulling ollama model llama2 using http://llm-gpu:11434

The docs told me to add that URL to the .env file. However, I certainly don't have server running there.

$ docker compose --profile linux-gpu up
WARN[0000] The "LANGCHAIN_PROJECT" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_PROJECT" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_PROJECT" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string. 
WARN[0000] The "OPENAI_API_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_PROJECT" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string. 
[+] Running 4/4
 ✔ llm-gpu 3 layers [⣿⣿⣿]      0B/0B      Pulled                                                                                                        1.3s 
   ✔ aece8493d397 Already exists                                                                                                                        0.0s 
   ✔ 3b9196308e0f Already exists                                                                                                                        0.0s 
   ✔ e75cbce7870b Already exists                                                                                                                        0.0s 
[+] Building 0.0s (0/0)                                                                                                                 docker:desktop-linux
[+] Running 8/8
 ✔ Container genai-stack-llm-gpu-1     Created                                                                                                          0.0s 
 ✔ Container genai-stack-database-1    Running                                                                                                          0.0s 
 ✔ Container genai-stack-pull-model-1  Recreated                                                                                                        0.1s 
 ✔ Container genai-stack-api-1         Recreated                                                                                                        0.1s 
 ✔ Container genai-stack-bot-1         Recreated                                                                                                        0.1s 
 ✔ Container genai-stack-pdf_bot-1     Recreated                                                                                                        0.1s 
 ✔ Container genai-stack-loader-1      Recreated                                                                                                        0.1s 
 ✔ Container genai-stack-front-end-1   Recreated                                                                                                        0.1s 
Attaching to genai-stack-api-1, genai-stack-bot-1, genai-stack-database-1, genai-stack-front-end-1, genai-stack-llm-gpu-1, genai-stack-loader-1, genai-stack-pdf_bot-1, genai-stack-pull-model-1
genai-stack-pull-model-1  | pulling ollama model llama2 using http://llm-gpu:11434
genai-stack-pull-model-1  | Error: Head "http://llm-gpu:11434/": dial tcp 172.18.0.4:11434: connect: no route to host
genai-stack-pull-model-1 exited with code 1
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown
@matthieuml
Copy link
Contributor

The docs told me to add that URL to the .env file. However, I certainly don't have server
running there.

If the container genai-stack-llm-gpu-1 is running, then you have a server running at http://llm-gpu:11434/ internally to Docker.

What seems here to be the issue here is your Nvidia runtime integration with Docker.

Are you able to run this command successfully?

docker run -it --rm --gpus all ubuntu nvidia-smi

If not try to reinstall Docker.

@Toparvion
Copy link

Toparvion commented Jan 27, 2024

@matthieuml , I've faced the same issue and tried the command you've proposed. The error in the result is the same as I see when run GenAI stack with --profile linux-gpu, namely:

#
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown. 

I have also followed your advice from #62 (installed nvidia-container-toolkit) but nothing has changed.

The main hint here seems that I run the stack in Docker Desktop 4.26.1 (on Ubuntu 23.10). The nvidia-smi displays the following about the GPU:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 ...    Off | 00000000:01:00.0  On |                  N/A |
| N/A   47C    P8              11W /  55W |    628MiB /  6144MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Some reported issues that I've found in the net so far suggest to use Docker CE instead of Docker Desktop. But it looks opposite to what GenAI stack promotes — the easy and developer-friendly way to build LLM-powered applications.

Is there other way to resolve the issue?

@matthieuml
Copy link
Contributor

After looking a bit around, it seems that nvidia-container-toolkit needs docker-ce installed as root to work (which isn't the case with Docker Desktop?).

The obvious way to resolve this issue would be to use docker-ce installed as root or even podman as an alternative. The Docker CLI is well documented and in combination with docker-compose you can deploy the stack quite easily.

However, if you want to keep a developer-friendly UI, maybe you could use portainer-ce in combination with docker-ce as root?

@Toparvion
Copy link

@matthieuml ,

which isn't the case with Docker Desktop?

Yes, this seems to be the root cause.

Ok, I'll switch to Docker CE.

Perhaps, it's worth adding a note about Docker Desktop incompatibility with linux-gpu profile to the README.md as well as mentioning the necessity to install nvidia-container-toolkit.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants