Unable to Find libnvidia-ml.so.1 When Using "docker compose linux-gpu up" #95

medined · 2023-11-20T03:28:23Z

Here is the result of my command. Is this error inside the container or outside? The weird part to me is:

genai-stack-pull-model-1 | pulling ollama model llama2 using http://llm-gpu:11434

The docs told me to add that URL to the .env file. However, I certainly don't have server running there.

$ docker compose --profile linux-gpu up
WARN[0000] The "LANGCHAIN_PROJECT" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_PROJECT" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_PROJECT" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string. 
WARN[0000] The "OPENAI_API_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_PROJECT" variable is not set. Defaulting to a blank string. 
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string. 
[+] Running 4/4
 ✔ llm-gpu 3 layers [⣿⣿⣿]      0B/0B      Pulled                                                                                                        1.3s 
   ✔ aece8493d397 Already exists                                                                                                                        0.0s 
   ✔ 3b9196308e0f Already exists                                                                                                                        0.0s 
   ✔ e75cbce7870b Already exists                                                                                                                        0.0s 
[+] Building 0.0s (0/0)                                                                                                                 docker:desktop-linux
[+] Running 8/8
 ✔ Container genai-stack-llm-gpu-1     Created                                                                                                          0.0s 
 ✔ Container genai-stack-database-1    Running                                                                                                          0.0s 
 ✔ Container genai-stack-pull-model-1  Recreated                                                                                                        0.1s 
 ✔ Container genai-stack-api-1         Recreated                                                                                                        0.1s 
 ✔ Container genai-stack-bot-1         Recreated                                                                                                        0.1s 
 ✔ Container genai-stack-pdf_bot-1     Recreated                                                                                                        0.1s 
 ✔ Container genai-stack-loader-1      Recreated                                                                                                        0.1s 
 ✔ Container genai-stack-front-end-1   Recreated                                                                                                        0.1s 
Attaching to genai-stack-api-1, genai-stack-bot-1, genai-stack-database-1, genai-stack-front-end-1, genai-stack-llm-gpu-1, genai-stack-loader-1, genai-stack-pdf_bot-1, genai-stack-pull-model-1
genai-stack-pull-model-1  | pulling ollama model llama2 using http://llm-gpu:11434
genai-stack-pull-model-1  | Error: Head "http://llm-gpu:11434/": dial tcp 172.18.0.4:11434: connect: no route to host
genai-stack-pull-model-1 exited with code 1
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

The text was updated successfully, but these errors were encountered:

matthieuml · 2023-11-21T08:09:09Z

The docs told me to add that URL to the .env file. However, I certainly don't have server
running there.

If the container genai-stack-llm-gpu-1 is running, then you have a server running at http://llm-gpu:11434/ internally to Docker.

What seems here to be the issue here is your Nvidia runtime integration with Docker.

Are you able to run this command successfully?

docker run -it --rm --gpus all ubuntu nvidia-smi

If not try to reinstall Docker.

Toparvion · 2024-01-27T15:04:40Z

@matthieuml , I've faced the same issue and tried the command you've proposed. The error in the result is the same as I see when run GenAI stack with --profile linux-gpu, namely:

#…
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

I have also followed your advice from #62 (installed nvidia-container-toolkit) but nothing has changed.

The main hint here seems that I run the stack in Docker Desktop 4.26.1 (on Ubuntu 23.10). The nvidia-smi displays the following about the GPU:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 ...    Off | 00000000:01:00.0  On |                  N/A |
| N/A   47C    P8              11W /  55W |    628MiB /  6144MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Some reported issues that I've found in the net so far suggest to use Docker CE instead of Docker Desktop. But it looks opposite to what GenAI stack promotes — the easy and developer-friendly way to build LLM-powered applications.

Is there other way to resolve the issue?

matthieuml · 2024-01-28T16:34:27Z

After looking a bit around, it seems that nvidia-container-toolkit needs docker-ce installed as root to work (which isn't the case with Docker Desktop?).

The obvious way to resolve this issue would be to use docker-ce installed as root or even podman as an alternative. The Docker CLI is well documented and in combination with docker-compose you can deploy the stack quite easily.

However, if you want to keep a developer-friendly UI, maybe you could use portainer-ce in combination with docker-ce as root?

Toparvion · 2024-01-29T03:12:06Z

@matthieuml ,

which isn't the case with Docker Desktop?

Yes, this seems to be the root cause.

Ok, I'll switch to Docker CE.

Perhaps, it's worth adding a note about Docker Desktop incompatibility with linux-gpu profile to the README.md as well as mentioning the necessity to install nvidia-container-toolkit.

Thank you!

sHedC mentioned this issue Jul 14, 2024

Podman + PRIME (dual gpu intel + nvidia) not working netbrain/zwift#21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to Find libnvidia-ml.so.1 When Using "docker compose linux-gpu up" #95

Unable to Find libnvidia-ml.so.1 When Using "docker compose linux-gpu up" #95

medined commented Nov 20, 2023

matthieuml commented Nov 21, 2023

Toparvion commented Jan 27, 2024 •

edited

Loading

matthieuml commented Jan 28, 2024

Toparvion commented Jan 29, 2024

Unable to Find libnvidia-ml.so.1 When Using "docker compose linux-gpu up" #95

Unable to Find libnvidia-ml.so.1 When Using "docker compose linux-gpu up" #95

Comments

medined commented Nov 20, 2023

matthieuml commented Nov 21, 2023

Toparvion commented Jan 27, 2024 • edited Loading

matthieuml commented Jan 28, 2024

Toparvion commented Jan 29, 2024

Toparvion commented Jan 27, 2024 •

edited

Loading