Skip to content

Commit

Permalink
Merge pull request #6 from DaniAsh551/master
Browse files Browse the repository at this point in the history
Revised nvidia docker support
  • Loading branch information
Noeda authored Apr 9, 2023
2 parents 059c948 + faf98bd commit 1e1131f
Show file tree
Hide file tree
Showing 3 changed files with 87 additions and 10 deletions.
32 changes: 32 additions & 0 deletions .docker/nvidia.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
FROM debian:bookworm

ARG DEBIAN_FRONTEND=noninteractive
RUN apt update -y
RUN apt install -y curl \
apt-utils \
unzip \
tar \
curl \
xz-utils \
ocl-icd-libopencl1 \
opencl-headers \
clinfo \
build-essential \
gcc

RUN mkdir -p /etc/OpenCL/vendors && \
echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs > /rustup.sh
RUN chmod +x /rustup.sh
RUN /rustup.sh -y

RUN apt install -y opencl-dev

RUN bash -c 'export PATH="$PATH:$HOME/.cargo/bin";rustup default nightly'

COPY . /opt/rllama
RUN bash -c 'export PATH="$PATH:$HOME/.cargo/bin";cd /opt/rllama;RUSTFLAGS="-C target-feature=+sse2,+avx,+fma,+avx2" cargo build --release --features server,opencl'
RUN ln -s /opt/rllama/target/release/rllama /usr/bin
40 changes: 40 additions & 0 deletions .docker/nvidia.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#rllama docker on nvidia

## Getting OpenCL to work inside docker.
Please note that this also requires some packages and modifications on your host system in order to allow the containers to use nvidia GPU features such as **compute**.


For each of the described distro / distro-family you could follow the instructions at the given links below.

**Note**: You also need an upto-date version of docker/docker-ce so be sure to follow the instructions to install docker for your distro from the [docker website](https://docs.docker.com/engine/install).

**Note2**: I have only personally tested the instructions on fedora/nobara and hence, cannot guarantee the accuracy of the instructions for other distros.

### Fedora / Fedora-based
**[https://gist.github.com/JuanM04/fcbed16d0f4405a286adebee5fd31cb2](https://gist.github.com/JuanM04/fcbed16d0f4405a286adebee5fd31cb2)**


### Debian / Debian-based / Ubuntu / Ubuntu-based
**[https://www.howtogeek.com/devops/how-to-use-an-nvidia-gpu-with-docker-containers/](https://www.howtogeek.com/devops/how-to-use-an-nvidia-gpu-with-docker-containers/)**


### Arch / Arch-based
**[https://wiki.archlinux.org/title/Docker#Run_GPU_accelerated_Docker_containers_with_NVIDIA_GPUs](https://wiki.archlinux.org/title/Docker#Run_GPU_accelerated_Docker_containers_with_NVIDIA_GPUs)**

Feel free to contribute/improve the instructions for existing and other distros.

## Usage
1.
```bash
docker build -f ./.docker/nvidia.dockerfile -t rllama:nvidia .
```
2.
```bash
docker run --rm --gpus all --privileged -v /models/LLaMA:/models:z -it rllama:nvidia \
rllama --model-path /models/7B \
--param-path /models/7B/params.json \
--tokenizer-path /models/tokenizer.model \
--prompt "hi I like cheese"
```

Replace `/models/LLaMA` with the directory you've downloaded your models to. The `:z` in `-v` flag may or may not be needed depending on your distribution (I needed it on Fedora Linux)
25 changes: 15 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ LLaMA-65B: AMD Ryzen 5950X: 4186ms / token f16 (pure
OpenCL (all use f16):
LLaMA-7B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: 216ms / token (OpenCL on GPU)
LLaMA-7B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti: 216ms / token (OpenCL on GPU)
LLaMA-7B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 680ms / token (OpenCL on CPU)
LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: 420ms / token (OpenCL on GPU)
LLaMA-13B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti: 420ms / token (OpenCL on GPU)
LLaMA-13B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 1232ms / token (OpenCL on CPU)
LLaMA-30B: AMD Ryzen 5950X + OpenCL Ryzen 5950X: 4098ms / token (OpenCL on CPU)
```
Expand Down Expand Up @@ -59,6 +59,8 @@ features if you install manually from this Git repository instead.
There is a Dockerfile you can use if you'd rather just get started quickly and
you are familiar with `docker`. You still need to download the models yourself.


### For CPU-only docker support:
```
docker build -f ./.docker/cpu.dockerfile -t rllama .
```
Expand All @@ -75,6 +77,9 @@ Replace `/models/LLaMA` with the directory you've downloaded your models to.
The `:z` in `-v` flag may or may not be needed depending on your distribution
(I needed it on Fedora Linux)

### For GPU-enabled docker support with nvidia:
Follow the instructions [here](.docker/nvidia.md).

## LLaMA weights

Refer to https://github.com/facebookresearch/llama/ As of now, you need to be
Expand Down Expand Up @@ -316,26 +321,26 @@ LLaMA-13B: AMD Ryzen 3950X: 2005ms / token
# commit 63d27dba9091823f8ba11a270ab5790d6f597311 (13 March 2023)
# This one has one part of the transformer moved to GPU as a type of smoke test
LLaMA-7B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: 567ms / token
LLaMA-7B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti: 567ms / token
LLaMA-7B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 956ms / token
LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: 987ms / token
LLaMA-13B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti: 987ms / token
LLaMA-13B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 1706ms / token
# commit 35b0c372a87192761e17beb421699ea5ad4ac1ce (13 March 2023)
# I moved some attention stuff to OpenCL too.
LLaMA-7B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: 283ms / token
LLaMA-7B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti: 283ms / token
LLaMA-7B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 679ms / token
LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: <ran out of GPU memory>
LLaMA-13B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti: <ran out of GPU memory>
LLaMA-13B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 1226ms / token
# commit de5dd592777b3a4f5a9e8c93c8aeef25b9294364 (15 March 2023)
# The matrix multiplication on GPU is now much faster. It didn't have that much
# effect overall though, but I got modest improvement on LLaMA-7B GPU.
LLaMA-7B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: 247ms / token
LLaMA-7B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti: 247ms / token
LLaMA-7B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 680ms / token
LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: <ran out of GPU memory>
LLaMA-13B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti: <ran out of GPU memory>
LLaMA-13B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 1232ms / token
LLaMA-30B: AMD Ryzen 5950X + OpenCL Ryzen 5950X: 4098ms / token
Expand All @@ -357,6 +362,6 @@ LLaMA-65B: AMD Ryzen 5950X: 4186ms / token f16
# I've worked on making Vicuna-13B runnable and added an option to only
# partially use GPU. Improved one of the OpenCL kernels:
LLaMA-7B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: 420ms (at 90%/10% GPU/CPU split)
LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: 216ms (at 100% GPU)
LLaMA-7B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti: 420ms (at 90%/10% GPU/CPU split)
LLaMA-13B: AMD Ryzen 3950X + OpenCL RTX 3090 Ti: 216ms (at 100% GPU)
```

0 comments on commit 1e1131f

Please sign in to comment.