Merge pull request #6 from DaniAsh551/master

Revised nvidia docker support
Noeda · Apr 9, 2023 · 1e1131f · 1e1131f
2 parents 059c948 + faf98bd
commit 1e1131f
Show file tree

Hide file tree

Showing 3 changed files with 87 additions and 10 deletions.
diff --git a/.docker/nvidia.dockerfile b/.docker/nvidia.dockerfile
@@ -0,0 +1,32 @@
+FROM debian:bookworm
+
+ARG DEBIAN_FRONTEND=noninteractive
+RUN apt update -y
+RUN apt install -y curl \
+    apt-utils \
+    unzip \
+    tar \
+    curl \
+    xz-utils \
+    ocl-icd-libopencl1 \
+    opencl-headers \
+    clinfo \
+    build-essential \
+    gcc
+
+RUN mkdir -p /etc/OpenCL/vendors && \
+    echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
+ENV NVIDIA_VISIBLE_DEVICES all
+ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
+
+RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs > /rustup.sh
+RUN chmod +x /rustup.sh
+RUN /rustup.sh -y
+
+RUN apt install -y opencl-dev
+
+RUN bash -c 'export PATH="$PATH:$HOME/.cargo/bin";rustup default nightly'
+
+COPY . /opt/rllama
+RUN bash -c 'export PATH="$PATH:$HOME/.cargo/bin";cd /opt/rllama;RUSTFLAGS="-C target-feature=+sse2,+avx,+fma,+avx2" cargo build --release --features server,opencl'
+RUN ln -s /opt/rllama/target/release/rllama /usr/bin
diff --git a/.docker/nvidia.md b/.docker/nvidia.md
@@ -0,0 +1,40 @@
+#rllama docker on nvidia
+
+## Getting OpenCL to work inside docker.
+Please note that this also requires some packages and modifications on your host system in order to allow the containers to use nvidia GPU features such as **compute**.
+
+
+For each of the described distro / distro-family you could follow the instructions at the given links below.
+
+**Note**: You also need an upto-date version of docker/docker-ce so be sure to follow the instructions to install docker for your distro from the [docker website](https://docs.docker.com/engine/install).
+
+**Note2**: I have only personally tested the instructions on fedora/nobara and hence, cannot guarantee the accuracy of the instructions for other distros.
+
+### Fedora / Fedora-based
+**[https://gist.github.com/JuanM04/fcbed16d0f4405a286adebee5fd31cb2](https://gist.github.com/JuanM04/fcbed16d0f4405a286adebee5fd31cb2)**
+
+
+### Debian / Debian-based / Ubuntu / Ubuntu-based
+**[https://www.howtogeek.com/devops/how-to-use-an-nvidia-gpu-with-docker-containers/](https://www.howtogeek.com/devops/how-to-use-an-nvidia-gpu-with-docker-containers/)**
+
+
+### Arch / Arch-based
+**[https://wiki.archlinux.org/title/Docker#Run_GPU_accelerated_Docker_containers_with_NVIDIA_GPUs](https://wiki.archlinux.org/title/Docker#Run_GPU_accelerated_Docker_containers_with_NVIDIA_GPUs)**
+
+Feel free to contribute/improve the instructions for existing and other distros.
+
+## Usage
+1. 
+```bash
+docker build -f ./.docker/nvidia.dockerfile -t rllama:nvidia .
+```
+2.
+```bash
+docker run --rm --gpus all --privileged -v /models/LLaMA:/models:z -it rllama:nvidia \
+    rllama --model-path /models/7B \
+           --param-path /models/7B/params.json \
+           --tokenizer-path /models/tokenizer.model \
+           --prompt "hi I like cheese"
+```
+
+Replace `/models/LLaMA` with the directory you've downloaded your models to. The `:z` in `-v` flag may or may not be needed depending on your distribution (I needed it on Fedora Linux)
diff --git a/README.md b/README.md
@@ -29,9 +29,9 @@ LLaMA-65B: AMD Ryzen 5950X:                       4186ms / token    f16    (pure
 
 OpenCL (all use f16):
 
-LLaMA-7B:  AMD Ryzen 3950X + OpenCL GTX 3090 Ti:  216ms / token            (OpenCL on GPU)
+LLaMA-7B:  AMD Ryzen 3950X + OpenCL RTX 3090 Ti:  216ms / token            (OpenCL on GPU)
 LLaMA-7B:  AMD Ryzen 3950X + OpenCL Ryzen 3950X:  680ms / token            (OpenCL on CPU)
-LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti:  420ms / token            (OpenCL on GPU)
+LLaMA-13B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti:  420ms / token            (OpenCL on GPU)
 LLaMA-13B: AMD Ryzen 3950X + OpenCL Ryzen 3950X:  1232ms / token           (OpenCL on CPU)
 LLaMA-30B: AMD Ryzen 5950X + OpenCL Ryzen 5950X:  4098ms / token           (OpenCL on CPU)
 ```
@@ -59,6 +59,8 @@ features if you install manually from this Git repository instead.
 There is a Dockerfile you can use if you'd rather just get started quickly and
 you are familiar with `docker`. You still need to download the models yourself.
 
+
+### For CPU-only docker support:
 ```
 docker build -f ./.docker/cpu.dockerfile -t rllama .
 ```
@@ -75,6 +77,9 @@ Replace `/models/LLaMA` with the directory you've downloaded your models to.
 The `:z` in `-v` flag may or may not be needed depending on your distribution
 (I needed it on Fedora Linux)
 
+### For GPU-enabled docker support with nvidia:
+Follow the instructions [here](.docker/nvidia.md).
+
 ## LLaMA weights
 
 Refer to https://github.com/facebookresearch/llama/ As of now, you need to be
@@ -316,26 +321,26 @@ LLaMA-13B: AMD Ryzen 3950X: 2005ms / token
 # commit 63d27dba9091823f8ba11a270ab5790d6f597311  (13 March 2023)
 # This one has one part of the transformer moved to GPU as a type of smoke test
 
-LLaMA-7B:  AMD Ryzen 3950X + OpenCL GTX 3090 Ti:  567ms / token
+LLaMA-7B:  AMD Ryzen 3950X + OpenCL RTX 3090 Ti:  567ms / token
 LLaMA-7B:  AMD Ryzen 3950X + OpenCL Ryzen 3950X:  956ms / token
-LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti:  987ms / token
+LLaMA-13B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti:  987ms / token
 LLaMA-13B: AMD Ryzen 3950X + OpenCL Ryzen 3950X:  1706ms / token
 
 # commit 35b0c372a87192761e17beb421699ea5ad4ac1ce  (13 March 2023)
 # I moved some attention stuff to OpenCL too.
 
-LLaMA-7B:  AMD Ryzen 3950X + OpenCL GTX 3090 Ti:  283ms / token
+LLaMA-7B:  AMD Ryzen 3950X + OpenCL RTX 3090 Ti:  283ms / token
 LLaMA-7B:  AMD Ryzen 3950X + OpenCL Ryzen 3950X:  679ms / token
-LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti:  <ran out of GPU memory>
+LLaMA-13B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti:  <ran out of GPU memory>
 LLaMA-13B: AMD Ryzen 3950X + OpenCL Ryzen 3950X:  1226ms / token
 
 # commit de5dd592777b3a4f5a9e8c93c8aeef25b9294364  (15 March 2023)
 # The matrix multiplication on GPU is now much faster. It didn't have that much
 # effect overall though, but I got modest improvement on LLaMA-7B GPU.
 
-LLaMA-7B:  AMD Ryzen 3950X + OpenCL GTX 3090 Ti:  247ms / token
+LLaMA-7B:  AMD Ryzen 3950X + OpenCL RTX 3090 Ti:  247ms / token
 LLaMA-7B:  AMD Ryzen 3950X + OpenCL Ryzen 3950X:  680ms / token
-LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti:  <ran out of GPU memory>
+LLaMA-13B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti:  <ran out of GPU memory>
 LLaMA-13B: AMD Ryzen 3950X + OpenCL Ryzen 3950X:  1232ms / token
 LLaMA-30B: AMD Ryzen 5950X + OpenCL Ryzen 5950X:  4098ms / token
 
@@ -357,6 +362,6 @@ LLaMA-65B: AMD Ryzen 5950X: 4186ms / token    f16
 # I've worked on making Vicuna-13B runnable and added an option to only
 # partially use GPU. Improved one of the OpenCL kernels:
 
-LLaMA-7B:   AMD Ryzen 3950X + OpenCL GTX 3090 Ti:    420ms (at 90%/10% GPU/CPU split)
-LLaMA-13B:  AMD Ryzen 3950X + OpenCL GTX 3090 Ti:    216ms (at 100% GPU)
+LLaMA-7B:   AMD Ryzen 3950X + OpenCL RTX 3090 Ti:    420ms (at 90%/10% GPU/CPU split)
+LLaMA-13B:  AMD Ryzen 3950X + OpenCL RTX 3090 Ti:    216ms (at 100% GPU)
 ```