diff --git a/docs/learn/mixtral.md b/docs/learn/mixtral.md
new file mode 100644
index 000000000..6ccc29b42
--- /dev/null
+++ b/docs/learn/mixtral.md
@@ -0,0 +1,102 @@
+# Mixtral
+
+This example demonstrates how to deploy `mistralai/Mixtral-8x7B-Instruct-v0.1 `
+with `dstack`'s [services](../docs/guides/services.md) and [vLLM](https://vllm.ai/).
+
+## Define the configuration
+
+To deploy Mixtral as a service using vLLM, define the following configuration file:
+
+
+
+```yaml
+type: service
+
+python: "3.11"
+
+commands:
+ - conda install cuda # (required by megablocks)
+ - pip install torch # (required by megablocks)
+ - pip install vllm megablocks
+ - python -m vllm.entrypoints.openai.api_server
+ --model mistralai/Mixtral-8X7B-Instruct-v0.1
+ --host 0.0.0.0
+ --tensor-parallel-size 2 # should match the number of GPUs
+
+port: 8000
+```
+
+
+
+## Run the configuration
+
+!!! warning "Prerequisites"
+ Before running a service, make sure to set up a [gateway](../docs/guides/services.md#set-up-a-gateway).
+ However, it's not required when using dstack Cloud, as it's set up automatically.
+
+
+
+```shell
+$ dstack run . -f llms/mixtral.dstack.yml --gpu "80GB:2" --disk 200GB
+```
+
+
+
+!!! info "GPU memory"
+ To deploy Mixtral in `fp16`, ensure a minimum of `100GB` total GPU memory,
+ and adjust the `--tensor-parallel-size` parameter in the YAML configuration
+ to match the number of GPUs.
+
+!!! info "Disk size"
+ To deploy Mixtral, ensure a minimum of `200GB` of disk size.
+
+!!! info "Endpoint URL"
+ Once the service is deployed, its endpoint will be available at
+ `https://.` (using the domain set up for the gateway).
+
+ If you wish to customize the run name, you can use the `-n` argument with the `dstack run` command.
+
+Once the service is up, you can query it via it's OpenAI compatible endpoint:
+
+
+
+```shell
+$ curl -X POST --location https://yellow-cat-1.mydomain.com/v1/completions \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "NousResearch/Llama-2-7b-hf",
+ "prompt": "San Francisco is a",
+ "max_tokens": 7,
+ "temperature": 0
+ }'
+```
+
+
+
+!!! info "OpenAI-compatible API"
+ Since vLLM provides an OpenAI-compatible endpoint, feel free to access it using various OpenAI-compatible tools like
+ Chat UI, LangChain, Llama Index, etc.
+
+??? info "Hugging Face Hub token"
+
+ To use a model with gated access, ensure configuring the `HUGGING_FACE_HUB_TOKEN` environment variable
+ (with [`--env`](../docs/reference/cli/index.md#dstack-run) in `dstack run` or
+ using [`env`](../docs/reference/dstack.yml.md#service) in the configuration file).
+
+
+
+ ```shell
+ $ dstack run . --env HUGGING_FACE_HUB_TOKEN=<token> -f llms/mixtral.dstack.yml --gpu "80GB:2" --disk 200GB
+ ```
+
+
+## Source code
+
+The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).
+
+## What's next?
+
+1. Check the [vLLM](tgi.md) and [Text Generation Inference](tgi.md) examples
+2. Read about [services](../docs/guides/services.md)
+3. See all [learning materials](index.md)
+4. Join the [Discord server](https://discord.gg/u8SmfwPpMd)
\ No newline at end of file
diff --git a/docs/learn/tei.md b/docs/learn/tei.md
index d55d90dcc..f86b15f37 100644
--- a/docs/learn/tei.md
+++ b/docs/learn/tei.md
@@ -1,7 +1,7 @@
# Text Embeddings Inference
-This example demonstrates how to deploy a text embeddings model as an API using [Services](../docs/guides/services.md)
-and [TEI](https://github.com/huggingface/text-embeddings-inference), an open-source framework by Hugging Face.
+This example demonstrates how to use [TEI](https://github.com/huggingface/text-embeddings-inference) with `dstack`'s
+[services](../docs/guides/services.md) to deploy embeddings.
## Define the configuration
diff --git a/docs/learn/tgi.md b/docs/learn/tgi.md
index 18a4c5f85..dd3576090 100644
--- a/docs/learn/tgi.md
+++ b/docs/learn/tgi.md
@@ -1,6 +1,6 @@
# Text Generation Inference
-This example demonstrates how to deploy an LLM using [TGI](https://github.com/huggingface/text-generation-inference), an open-source framework by Hugging Face.
+This example demonstrates how to use [TGI](https://github.com/huggingface/text-generation-inference) with `dstack`'s [services](../docs/guides/services.md) to deploy LLMs.
## Define the configuration
diff --git a/docs/learn/vllm.md b/docs/learn/vllm.md
index ad19ff678..d9e13f745 100644
--- a/docs/learn/vllm.md
+++ b/docs/learn/vllm.md
@@ -1,6 +1,6 @@
# vLLM
-This example demonstrates how to deploy an LLM using [Services](../docs/guides/services.md) and [vLLM](https://vllm.ai/), an open-source library.
+This example demonstrates how to use [vLLM](https://vllm.ai/) with `dstack`'s [services](../docs/guides/services.md) to deploy LLMs.
## Define the configuration
diff --git a/docs/overrides/home.html b/docs/overrides/home.html
index 62de6c831..df6b1a628 100644
--- a/docs/overrides/home.html
+++ b/docs/overrides/home.html
@@ -328,6 +328,21 @@ Featured examples