Skip to content

Commit

Permalink
Merge pull request mistralai#19 from mistralai/add-dockerfile
Browse files Browse the repository at this point in the history
Add Dockerfile and build instructions
  • Loading branch information
timlacroix committed Sep 29, 2023
2 parents 69cecfe + c2f3904 commit 5680d3f
Show file tree
Hide file tree
Showing 4 changed files with 59 additions and 1 deletion.
13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,17 @@ This repository contains minimal code to run our 7B model.\
Blog: [https://mistral.ai/news/announcing-mistral-7b/](https://mistral.ai/news/announcing-mistral-7b/)\
Discord: [https://discord.com/invite/mistralai](https://discord.com/invite/mistralai)

## Deployment

The `deploy` folder contains code to build a [vLLM](https://github.com/vllm-project/vllm) image with the required dependencies to serve the Mistral AI model. In the image, the [transformers](https://github.c
ggingface/transformers/) library is used instead of the reference implementation. To build it:

```bash
docker build deploy --build-arg MAX_JOBS=8
```

Instructions to run the image can be found in the [official documentation](https://docs.mistral.ai/quickstart).

## Installation

```
Expand Down Expand Up @@ -107,4 +118,4 @@ For this we can choose as chunk size the window size. For each chunk, we thus ne

[1] [Generating Long Sequences with Sparse Transformers, Child et al. 2019](https://arxiv.org/pdf/1904.10509.pdf)

[2] [Longformer: The Long-Document Transformer, Beltagy et al. 2020](https://arxiv.org/pdf/2004.05150v2.pdf)
[2] [Longformer: The Long-Document Transformer, Beltagy et al. 2020](https://arxiv.org/pdf/2004.05150v2.pdf)
2 changes: 2 additions & 0 deletions deploy/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*
!entrypoint.sh
34 changes: 34 additions & 0 deletions deploy/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
FROM --platform=amd64 nvcr.io/nvidia/cuda:11.8.0-devel-ubuntu22.04 as base

ARG MAX_JOBS

WORKDIR /workspace

RUN apt update && \
apt install -y python3-pip python3-packaging \
git ninja-build && \
pip3 install -U pip

# Tweak this list to reduce build time
# https://developer.nvidia.com/cuda-gpus
ENV TORCH_CUDA_ARCH_LIST "7.0;7.2;7.5;8.0;8.6;8.9;9.0"

# We have to manually install Torch otherwise apex & xformers won't build
RUN pip3 install "torch>=2.0.0"

# This build is slow but NVIDIA does not provide binaries. Increase MAX_JOBS as needed.
RUN git clone https://github.com/NVIDIA/apex && \
cd apex && \
sed -i '/check_cuda_torch_binary_vs_bare_metal(CUDA_HOME)/d' setup.py && \
python3 setup.py install --cpp_ext --cuda_ext

RUN pip3 install "xformers==0.0.22" "transformers==4.33.3" "vllm==0.2.0"

# Fastchat is not yet released, so use our own commit
RUN pip3 install "fschat[model_worker] @ git+https://github.com/mistralai/FastChat-release.git@bde1118ed8df8c42fe98076424f60505465d4265#egg=FastChat"

COPY entrypoint.sh .

RUN chmod +x /workspace/entrypoint.sh

ENTRYPOINT ["/workspace/entrypoint.sh"]
11 changes: 11 additions & 0 deletions deploy/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

if [[ ! -z "${HF_TOKEN}" ]]; then
echo "The HF_TOKEN environment variable set, logging to Hugging Face."
python3 -c "import huggingface_hub; huggingface_hub.login('${HF_TOKEN}')"
else
echo "The HF_TOKEN environment variable is not set or empty, not logging to Hugging Face."
fi

# Run the provided command
exec python3 -u -m vllm.entrypoints.openai.api_server "$@"

0 comments on commit 5680d3f

Please sign in to comment.