Skip to content

Commit

Permalink
- Added Mixtral example
Browse files Browse the repository at this point in the history
  • Loading branch information
peterschmidt85 committed Dec 21, 2023
1 parent 4a09158 commit e85debf
Show file tree
Hide file tree
Showing 7 changed files with 138 additions and 4 deletions.
102 changes: 102 additions & 0 deletions docs/learn/mixtral.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Mixtral

This example demonstrates how to deploy `mistralai/Mixtral-8x7B-Instruct-v0.1 `
with `dstack`'s [services](../docs/guides/services.md) and [vLLM](https://vllm.ai/).

## Define the configuration

To deploy Mixtral as a service using vLLM, define the following configuration file:

<div editor-title="llms/mixtral/vllm.dstack.yml">

```yaml
type: service

python: "3.11"

commands:
- conda install cuda # (required by megablocks)
- pip install torch # (required by megablocks)
- pip install vllm megablocks
- python -m vllm.entrypoints.openai.api_server
--model mistralai/Mixtral-8X7B-Instruct-v0.1
--host 0.0.0.0
--tensor-parallel-size 2 # should match the number of GPUs

port: 8000
```
</div>
## Run the configuration
!!! warning "Prerequisites"
Before running a service, make sure to set up a [gateway](../docs/guides/services.md#set-up-a-gateway).
However, it's not required when using dstack Cloud, as it's set up automatically.
<div class="termy">
```shell
$ dstack run . -f llms/mixtral.dstack.yml --gpu "80GB:2" --disk 200GB
```

</div>

!!! info "GPU memory"
To deploy Mixtral in `fp16`, ensure a minimum of `100GB` total GPU memory,
and adjust the `--tensor-parallel-size` parameter in the YAML configuration
to match the number of GPUs.

!!! info "Disk size"
To deploy Mixtral, ensure a minimum of `200GB` of disk size.

!!! info "Endpoint URL"
Once the service is deployed, its endpoint will be available at
`https://<run-name>.<domain-name>` (using the domain set up for the gateway).

If you wish to customize the run name, you can use the `-n` argument with the `dstack run` command.

Once the service is up, you can query it via it's OpenAI compatible endpoint:

<div class="termy">

```shell
$ curl -X POST --location https://yellow-cat-1.mydomain.com/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "NousResearch/Llama-2-7b-hf",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": 0
}'
```

</div>

!!! info "OpenAI-compatible API"
Since vLLM provides an OpenAI-compatible endpoint, feel free to access it using various OpenAI-compatible tools like
Chat UI, LangChain, Llama Index, etc.

??? info "Hugging Face Hub token"

To use a model with gated access, ensure configuring the `HUGGING_FACE_HUB_TOKEN` environment variable
(with [`--env`](../docs/reference/cli/index.md#dstack-run) in `dstack run` or
using [`env`](../docs/reference/dstack.yml.md#service) in the configuration file).

<div class="termy">

```shell
$ dstack run . --env HUGGING_FACE_HUB_TOKEN=&lt;token&gt; -f llms/mixtral.dstack.yml --gpu "80GB:2" --disk 200GB
```
</div>

## Source code

The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).

## What's next?

1. Check the [vLLM](tgi.md) and [Text Generation Inference](tgi.md) examples
2. Read about [services](../docs/guides/services.md)
3. See all [learning materials](index.md)
4. Join the [Discord server](https://discord.gg/u8SmfwPpMd)
4 changes: 2 additions & 2 deletions docs/learn/tei.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Text Embeddings Inference

This example demonstrates how to deploy a text embeddings model as an API using [Services](../docs/guides/services.md)
and [TEI](https://github.com/huggingface/text-embeddings-inference), an open-source framework by Hugging Face.
This example demonstrates how to use [TEI](https://github.com/huggingface/text-embeddings-inference) with `dstack`'s
[services](../docs/guides/services.md) to deploy embeddings.

## Define the configuration

Expand Down
2 changes: 1 addition & 1 deletion docs/learn/tgi.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Text Generation Inference

This example demonstrates how to deploy an LLM using [TGI](https://github.com/huggingface/text-generation-inference), an open-source framework by Hugging Face.
This example demonstrates how to use [TGI](https://github.com/huggingface/text-generation-inference) with `dstack`'s [services](../docs/guides/services.md) to deploy LLMs.

## Define the configuration

Expand Down
2 changes: 1 addition & 1 deletion docs/learn/vllm.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# vLLM

This example demonstrates how to deploy an LLM using [Services](../docs/guides/services.md) and [vLLM](https://vllm.ai/), an open-source library.
This example demonstrates how to use [vLLM](https://vllm.ai/) with `dstack`'s [services](../docs/guides/services.md) to deploy LLMs.

## Define the configuration

Expand Down
15 changes: 15 additions & 0 deletions docs/overrides/home.html
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,21 @@ <h2>Featured examples</h2>
</div>

<div class="tx-landing__highlights_grid">
<a href="/learn/mixtral" class="feature-cell">
<h3>
Mixtral
</h3>

<p>
Deploy <strong>Mixtral-8x7B</strong> as a service using <strong>vLLM</strong>,
an open-source serving library.
</p>

<div class="feature-tags">
<div class="feature-tag">LLMs</div>
</div>
</a>

<a href="/learn/tei" class="feature-cell">
<h3>
Text Embeddings Inference
Expand Down
15 changes: 15 additions & 0 deletions docs/overrides/learn.html
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,21 @@ <h2>Learning materials</h2>
</div>

<div class="tx-landing__highlights_grid">
<a href="/learn/mixtral" class="feature-cell">
<h3>
Mixtral
</h3>

<p>
Deploy <strong>Mixtral-8x7B</strong> as a service using <strong>vLLM</strong>,
an open-source serving library.
</p>

<div class="feature-tags">
<div class="feature-tag">LLMs</div>
</div>
</a>

<a href="/learn/tei" class="feature-cell">
<h3>
Text Embeddings Inference
Expand Down
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,8 @@ nav:
- Text Generation Inference: learn/tgi.md
- Text Embedding Interface: learn/tei.md
- SDXL: learn/sdxl.md
- LLMs:
- Mixtral: learn/mixtral.md
- Best practices:
Spot instances: learn/spot.md
- Pricing: pricing.md
Expand Down

0 comments on commit e85debf

Please sign in to comment.