Fixed Cuda Dockerfile #598

pradhyumna85 · 2023-08-10T15:13:11Z

PR for the issue: #597

Previously models produced garbage output when running on GPU with layers offloaded.

Previously models produced garbage output when running on GPU with layers offloaded. Similar to related fix on another repo: bartowski1182/koboldcpp-docker@331326a

asteinba · 2023-08-12T21:38:58Z

Hey @pradhyumna85,

I think I just ran into this. What do you mean with garbage? Do you mean it still produces valid sentences but the answers doesn't make sense?

And can you also explain how the fix works/what was the reason :).

Thanks! Understanding this would help me a lot 😁

pradhyumna85 · 2023-08-13T12:52:33Z

Hi @asteinba,
By garbage I mean, absolute garbage - symbols, special characters, Unicode characters etc, no English or any language.

The most relevant change in the Dockerfile is the environment variable - CUDA_DOCKER_ARCH=all
It is mentioned in the readme of the official llama.cpp repo (https://github.com/ggerganov/llama.cpp) in the "Docker with CUDA section"

It basically passes the -Wno-deprecated-gpu-targets to the nvccflags in the llama.cpp makefile. (https://github.com/ggerganov/llama.cpp/blob/master/Makefile)

.
.
.
ifdef LLAMA_CUBLAS
	CFLAGS    += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/x86_64-linux/include
	CXXFLAGS  += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/x86_64-linux/include
	LDFLAGS   += -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/x86_64-linux/lib
	OBJS      += ggml-cuda.o
	NVCCFLAGS = --forward-unknown-to-host-compiler -use_fast_math
ifdef LLAMA_CUDA_NVCC
	NVCC = $(LLAMA_CUDA_NVCC)
else
	NVCC = nvcc
endif #LLAMA_CUDA_NVCC
ifdef CUDA_DOCKER_ARCH
	NVCCFLAGS += -Wno-deprecated-gpu-targets -arch=$(CUDA_DOCKER_ARCH)
else
	NVCCFLAGS += -arch=native
endif # CUDA_DOCKER_ARCH
.
.
.

CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python, I just copied from the readme (https://github.com/abetlen/llama-cpp-python) instruction for cublas installation.

asteinba · 2023-08-14T06:51:27Z

Thank you very much for the explanation! I really appreciate that :)

abetlen · 2023-08-18T03:20:36Z

@pradhyumna85 thank you for the fix, lgtm

Fixed Cuda Dockerfile

d010ea7

Previously models produced garbage output when running on GPU with layers offloaded. Similar to related fix on another repo: bartowski1182/koboldcpp-docker@331326a

Belval mentioned this pull request Aug 17, 2023

CUDA / Metal support getumbrel/llama-gpt#6

Closed

abetlen merged commit 4cf0461 into abetlen:main Aug 18, 2023

pradhyumna85 mentioned this pull request Aug 18, 2023

Cuda Dockerfile - model produces garbage output on GPU (if layers offloaded to GPU) #597

Closed

antoine-lizee pushed a commit to antoine-lizee/llama-cpp-python that referenced this pull request Oct 30, 2023

Enable -std= for cmake builds, fix warnings (abetlen#598)

3525899

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed Cuda Dockerfile #598

Fixed Cuda Dockerfile #598

pradhyumna85 commented Aug 10, 2023

asteinba commented Aug 12, 2023

pradhyumna85 commented Aug 13, 2023

asteinba commented Aug 14, 2023

abetlen commented Aug 18, 2023

Fixed Cuda Dockerfile #598

Fixed Cuda Dockerfile #598

Conversation

pradhyumna85 commented Aug 10, 2023

asteinba commented Aug 12, 2023

pradhyumna85 commented Aug 13, 2023

asteinba commented Aug 14, 2023

abetlen commented Aug 18, 2023