Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed Cuda Dockerfile #598

Merged
merged 1 commit into from
Aug 18, 2023
Merged

Fixed Cuda Dockerfile #598

merged 1 commit into from
Aug 18, 2023

Conversation

pradhyumna85
Copy link
Contributor

PR for the issue: #597

Previously models produced garbage output when running on GPU with layers offloaded.

Previously models produced garbage output when running on GPU with layers offloaded.

Similar to related fix on another repo: bartowski1182/koboldcpp-docker@331326a
@asteinba
Copy link

Hey @pradhyumna85,

I think I just ran into this. What do you mean with garbage? Do you mean it still produces valid sentences but the answers doesn't make sense?

And can you also explain how the fix works/what was the reason :).

Thanks! Understanding this would help me a lot 😁

@pradhyumna85
Copy link
Contributor Author

Hi @asteinba,
By garbage I mean, absolute garbage - symbols, special characters, Unicode characters etc, no English or any language.

The most relevant change in the Dockerfile is the environment variable - CUDA_DOCKER_ARCH=all
It is mentioned in the readme of the official llama.cpp repo (https://github.com/ggerganov/llama.cpp) in the "Docker with CUDA section"

It basically passes the -Wno-deprecated-gpu-targets to the nvccflags in the llama.cpp makefile. (https://github.com/ggerganov/llama.cpp/blob/master/Makefile)

.
.
.
ifdef LLAMA_CUBLAS
	CFLAGS    += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/x86_64-linux/include
	CXXFLAGS  += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/x86_64-linux/include
	LDFLAGS   += -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/x86_64-linux/lib
	OBJS      += ggml-cuda.o
	NVCCFLAGS = --forward-unknown-to-host-compiler -use_fast_math
ifdef LLAMA_CUDA_NVCC
	NVCC = $(LLAMA_CUDA_NVCC)
else
	NVCC = nvcc
endif #LLAMA_CUDA_NVCC
ifdef CUDA_DOCKER_ARCH
	NVCCFLAGS += -Wno-deprecated-gpu-targets -arch=$(CUDA_DOCKER_ARCH)
else
	NVCCFLAGS += -arch=native
endif # CUDA_DOCKER_ARCH
.
.
.

CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python, I just copied from the readme (https://github.com/abetlen/llama-cpp-python) instruction for cublas installation.

@asteinba
Copy link

Thank you very much for the explanation! I really appreciate that :)

@abetlen
Copy link
Owner

abetlen commented Aug 18, 2023

@pradhyumna85 thank you for the fix, lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants