Merge pull request mistralai#19 from mistralai/add-dockerfile

Add Dockerfile and build instructions
Ferruolo · Sep 29, 2023 · 5680d3f · 5680d3f
2 parents 69cecfe + c2f3904
commit 5680d3f
Show file tree

Hide file tree

Showing 4 changed files with 59 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -4,6 +4,17 @@ This repository contains minimal code to run our 7B model.\
 Blog: [https://mistral.ai/news/announcing-mistral-7b/](https://mistral.ai/news/announcing-mistral-7b/)\
 Discord: [https://discord.com/invite/mistralai](https://discord.com/invite/mistralai)
 
+## Deployment
+
+The `deploy` folder contains code to build a [vLLM](https://github.com/vllm-project/vllm) image with the required dependencies to serve the Mistral AI model. In the image, the [transformers](https://github.c
+ggingface/transformers/) library is used instead of the reference implementation. To build it:
+
+```bash
+docker build deploy --build-arg MAX_JOBS=8
+```
+
+Instructions to run the image can be found in the [official documentation](https://docs.mistral.ai/quickstart).
+
 ## Installation
 
 ```
@@ -107,4 +118,4 @@ For this we can choose as chunk size the window size. For each chunk, we thus ne
 
 [1] [Generating Long Sequences with Sparse Transformers, Child et al. 2019](https://arxiv.org/pdf/1904.10509.pdf)
 
-[2] [Longformer: The Long-Document Transformer, Beltagy et al. 2020](https://arxiv.org/pdf/2004.05150v2.pdf)
+[2] [Longformer: The Long-Document Transformer, Beltagy et al. 2020](https://arxiv.org/pdf/2004.05150v2.pdf)
diff --git a/deploy/.dockerignore b/deploy/.dockerignore
@@ -0,0 +1,2 @@
+*
+!entrypoint.sh
diff --git a/deploy/Dockerfile b/deploy/Dockerfile
@@ -0,0 +1,34 @@
+FROM --platform=amd64 nvcr.io/nvidia/cuda:11.8.0-devel-ubuntu22.04 as base
+
+ARG MAX_JOBS
+
+WORKDIR /workspace
+
+RUN apt update && \
+    apt install -y python3-pip python3-packaging \
+    git ninja-build && \
+    pip3 install -U pip
+
+# Tweak this list to reduce build time
+# https://developer.nvidia.com/cuda-gpus
+ENV TORCH_CUDA_ARCH_LIST "7.0;7.2;7.5;8.0;8.6;8.9;9.0"
+
+# We have to manually install Torch otherwise apex & xformers won't build
+RUN pip3 install "torch>=2.0.0"
+
+# This build is slow but NVIDIA does not provide binaries. Increase MAX_JOBS as needed.
+RUN git clone https://github.com/NVIDIA/apex && \
+    cd apex && \
+    sed -i '/check_cuda_torch_binary_vs_bare_metal(CUDA_HOME)/d' setup.py && \
+    python3 setup.py install --cpp_ext --cuda_ext
+
+RUN pip3 install "xformers==0.0.22" "transformers==4.33.3" "vllm==0.2.0"
+
+# Fastchat is not yet released, so use our own commit
+RUN pip3 install "fschat[model_worker] @ git+https://github.com/mistralai/FastChat-release.git@bde1118ed8df8c42fe98076424f60505465d4265#egg=FastChat"
+
+COPY entrypoint.sh .
+
+RUN chmod +x /workspace/entrypoint.sh
+
+ENTRYPOINT ["/workspace/entrypoint.sh"]
diff --git a/deploy/entrypoint.sh b/deploy/entrypoint.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+if [[ ! -z "${HF_TOKEN}" ]]; then
+    echo "The HF_TOKEN environment variable set, logging to Hugging Face."
+    python3 -c "import huggingface_hub; huggingface_hub.login('${HF_TOKEN}')"
+else
+    echo "The HF_TOKEN environment variable is not set or empty, not logging to Hugging Face."
+fi
+
+# Run the provided command
+exec python3 -u -m vllm.entrypoints.openai.api_server "$@"