You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First off I'd just like to say this project is absolutely fantastic.
I'm having a bit of trouble trying to get the GPU to be used. I have a 2080 super, and I am able to see that using nvidia-smi in the container once it's up and running. However I don't ever see processes utilizing the GPU, and I only see the CPU going up to 100% usage after I ask the AI a question.
Here is my docker-compose-cuda-gguf.yml
version: '3.6'
services:
llama-gpt-api-cuda-gguf:
image: ghcr.io/abetlen/llama-cpp-python:latest
# build:
# context: ./cuda
# dockerfile: gguf.Dockerfile
restart: on-failure
volumes:
- './models:/models'
- './cuda:/cuda'
ports:
- 3001:8000
environment:
MODEL: '/models/${MODEL_NAME:-code-llama-2-13b-chat.gguf}'
MODEL_DOWNLOAD_URL: '${MODEL_DOWNLOAD_URL:-https://huggingface.co/TheBloke/CodeLlama-13B-Instruct-GGUF/resolve/main/codellama-13b-instruct.Q4_K_M.gguf}'
N_GQA: '${N_GQA:-1}'
USE_MLOCK: 1
cap_add:
- IPC_LOCK
- SYS_RESOURCE
command: '/bin/sh /cuda/run.sh'
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
llama-gpt-ui:
# TODO: Use this image instead of building from source after the next release
image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
# build:
# context: ./ui
# dockerfile: Dockerfile
ports:
- 3000:3000
restart: on-failure
environment:
- 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
- 'OPENAI_API_HOST=http://llama-gpt-api-cuda-gguf:8000'
- 'DEFAULT_MODEL=/models/${MODEL_NAME:-code-llama-2-13b-chat.gguf}'
- 'NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=${DEFAULT_SYSTEM_PROMPT:-"You are a helpful and friendly AI assistant. Respond very concisely."}'
- 'WAIT_HOSTS=llama-gpt-api-cuda-gguf:8000'
- 'WAIT_TIMEOUT=${WAIT_TIMEOUT:-3600}'
The text was updated successfully, but these errors were encountered:
Hello,
First off I'd just like to say this project is absolutely fantastic.
I'm having a bit of trouble trying to get the GPU to be used. I have a 2080 super, and I am able to see that using
nvidia-smi
in the container once it's up and running. However I don't ever see processes utilizing the GPU, and I only see the CPU going up to 100% usage after I ask the AI a question.Here is my
docker-compose-cuda-gguf.yml
The text was updated successfully, but these errors were encountered: