- Added Mixtral example

dstackai · Dec 21, 2023 · e85debf · e85debf
1 parent 4a09158
commit e85debf
Show file tree

Hide file tree

Showing 7 changed files with 138 additions and 4 deletions.
diff --git a/docs/learn/mixtral.md b/docs/learn/mixtral.md
@@ -0,0 +1,102 @@
+# Mixtral
+
+This example demonstrates how to deploy `mistralai/Mixtral-8x7B-Instruct-v0.1 ` 
+with `dstack`'s [services](../docs/guides/services.md) and [vLLM](https://vllm.ai/).
+
+## Define the configuration
+
+To deploy Mixtral as a service using vLLM, define the following configuration file:
+
+<div editor-title="llms/mixtral/vllm.dstack.yml"> 
+
+```yaml
+type: service
+
+python: "3.11"
+
+commands:
+  - conda install cuda # (required by megablocks)
+  - pip install torch # (required by megablocks)
+  - pip install vllm megablocks
+  - python -m vllm.entrypoints.openai.api_server
+    --model mistralai/Mixtral-8X7B-Instruct-v0.1
+    --host 0.0.0.0
+    --tensor-parallel-size 2 # should match the number of GPUs
+
+port: 8000
+```
+
+</div>
+
+## Run the configuration
+
+!!! warning "Prerequisites"
+    Before running a service, make sure to set up a [gateway](../docs/guides/services.md#set-up-a-gateway).
+    However, it's not required when using dstack Cloud, as it's set up automatically.
+
+<div class="termy">
+
+```shell
+$ dstack run . -f llms/mixtral.dstack.yml --gpu "80GB:2" --disk 200GB
+```
+
+</div>
+
+!!! info "GPU memory"
+    To deploy Mixtral in `fp16`, ensure a minimum of `100GB` total GPU memory, 
+    and adjust the `--tensor-parallel-size` parameter in the YAML configuration 
+    to match the number of GPUs.
+
+!!! info "Disk size"
+    To deploy Mixtral, ensure a minimum of `200GB` of disk size.
+
+!!! info "Endpoint URL"
+    Once the service is deployed, its endpoint will be available at 
+    `https://<run-name>.<domain-name>` (using the domain set up for the gateway).
+
+    If you wish to customize the run name, you can use the `-n` argument with the `dstack run` command.
+
+Once the service is up, you can query it via it's OpenAI compatible endpoint:
+
+<div class="termy">
+
+```shell
+$ curl -X POST --location https://yellow-cat-1.mydomain.com/v1/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+          "model": "NousResearch/Llama-2-7b-hf",
+          "prompt": "San Francisco is a",
+          "max_tokens": 7,
+          "temperature": 0
+        }'
+```
+
+</div>
+
+!!! info "OpenAI-compatible API"
+    Since vLLM provides an OpenAI-compatible endpoint, feel free to access it using various OpenAI-compatible tools like
+    Chat UI, LangChain, Llama Index, etc. 
+
+??? info "Hugging Face Hub token"
+
+    To use a model with gated access, ensure configuring the `HUGGING_FACE_HUB_TOKEN` environment variable 
+    (with [`--env`](../docs/reference/cli/index.md#dstack-run) in `dstack run` or 
+    using [`env`](../docs/reference/dstack.yml.md#service) in the configuration file).
+
+    <div class="termy">
+
+    ```shell
+    $ dstack run . --env HUGGING_FACE_HUB_TOKEN=&lt;token&gt; -f llms/mixtral.dstack.yml --gpu "80GB:2" --disk 200GB
+    ```
+    </div>
+
+## Source code
+
+The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).
+
+## What's next?
+
+1. Check the [vLLM](tgi.md) and [Text Generation Inference](tgi.md) examples
+2. Read about [services](../docs/guides/services.md)
+3. See all [learning materials](index.md)
+4. Join the [Discord server](https://discord.gg/u8SmfwPpMd)
diff --git a/docs/learn/tei.md b/docs/learn/tei.md
@@ -1,7 +1,7 @@
 # Text Embeddings Inference
 
-This example demonstrates how to deploy a text embeddings model as an API using [Services](../docs/guides/services.md)
-and [TEI](https://github.com/huggingface/text-embeddings-inference), an open-source framework by Hugging Face.
+This example demonstrates how to use [TEI](https://github.com/huggingface/text-embeddings-inference) with `dstack`'s
+[services](../docs/guides/services.md) to deploy embeddings.
 
 ## Define the configuration
 

diff --git a/docs/learn/tgi.md b/docs/learn/tgi.md
@@ -1,6 +1,6 @@
 # Text Generation Inference
 
-This example demonstrates how to deploy an LLM using [TGI](https://github.com/huggingface/text-generation-inference), an open-source framework by Hugging Face.
+This example demonstrates how to use [TGI](https://github.com/huggingface/text-generation-inference) with `dstack`'s [services](../docs/guides/services.md) to deploy LLMs.
 
 ## Define the configuration
 

diff --git a/docs/learn/vllm.md b/docs/learn/vllm.md
@@ -1,6 +1,6 @@
 # vLLM
 
-This example demonstrates how to deploy an LLM using [Services](../docs/guides/services.md) and [vLLM](https://vllm.ai/), an open-source library.
+This example demonstrates how to use [vLLM](https://vllm.ai/) with `dstack`'s [services](../docs/guides/services.md) to deploy LLMs.
 
 ## Define the configuration
 

diff --git a/docs/overrides/home.html b/docs/overrides/home.html
@@ -328,6 +328,21 @@ <h2>Featured examples</h2>
             </div>
 
             <div class="tx-landing__highlights_grid">
+                <a href="/learn/mixtral" class="feature-cell">
+                    <h3>
+                        Mixtral
+                    </h3>
+
+                    <p>
+                        Deploy <strong>Mixtral-8x7B</strong> as a service using <strong>vLLM</strong>,
+                        an open-source serving library.
+                    </p>
+
+                    <div class="feature-tags">
+                        <div class="feature-tag">LLMs</div>
+                    </div>
+                </a>
+
                 <a href="/learn/tei" class="feature-cell">
                     <h3>
                         Text Embeddings Inference

diff --git a/docs/overrides/learn.html b/docs/overrides/learn.html
@@ -11,6 +11,21 @@ <h2>Learning materials</h2>
                     </div>
 
                     <div class="tx-landing__highlights_grid">
+                        <a href="/learn/mixtral" class="feature-cell">
+                            <h3>
+                                Mixtral
+                            </h3>
+
+                            <p>
+                                Deploy <strong>Mixtral-8x7B</strong> as a service using <strong>vLLM</strong>,
+                                an open-source serving library.
+                            </p>
+
+                            <div class="feature-tags">
+                                <div class="feature-tag">LLMs</div>
+                            </div>
+                        </a>
+
                         <a href="/learn/tei" class="feature-cell">
                             <h3>
                                 Text Embeddings Inference

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -186,6 +186,8 @@ nav:
       - Text Generation Inference: learn/tgi.md
       - Text Embedding Interface: learn/tei.md
       - SDXL: learn/sdxl.md
+    - LLMs:
+      - Mixtral: learn/mixtral.md
     - Best practices:
         Spot instances: learn/spot.md
   - Pricing: pricing.md