diff --git a/README.md b/README.md
index 9c2880ac2..c443a0f9b 100644
--- a/README.md
+++ b/README.md
@@ -29,7 +29,7 @@ Supported providers: AWS, GCP, Azure, Lambda, TensorDock, Vast.ai, and DataCrunc
 
 ## Latest news ✨
 
-- [2024/01] [dstack 0.14.0rc1: OpenAI-compatible endpoints](https://dstack.ai/blog/2024/01/19/openai-endpoints-preview/) (Preview)
+- [2024/01] [dstack 0.14.0: OpenAI-compatible endpoints preview](https://dstack.ai/blog/2024/01/19/openai-endpoints-preview/) (Release)
 - [2023/12] [dstack 0.13.0: Disk size, CUDA 12.1, Mixtral, and more](https://dstack.ai/blog/2023/12/22/disk-size-cuda-12-1-mixtral-and-more/) (Release)
 - [2023/11] [dstack 0.12.3: Vast.ai integration](https://dstack.ai/blog/2023/11/21/vastai/) (Release)
 - [2023/10] [dstack 0.12.2: TensorDock integration](https://dstack.ai/blog/2023/10/31/tensordock/) (Release)
diff --git a/docs/blog/posts/openai-endpoints-preview.md b/docs/blog/posts/openai-endpoints-preview.md
index 30a46f88f..f0b509995 100644
--- a/docs/blog/posts/openai-endpoints-preview.md
+++ b/docs/blog/posts/openai-endpoints-preview.md
@@ -1,24 +1,23 @@
 ---
-title: "dstack 0.14.0rc1: OpenAI-compatible endpoints preview"
+title: "dstack 0.14.0: OpenAI-compatible endpoints preview"
 date: 2024-01-19
 description: "Making it easier to deploy custom LLMs as OpenAI-compatible endpoints."
 slug: "openai-endpoints-preview"
 categories:
-  - Previews
+  - Releases
 ---
 
-# dstack 0.14.0rc1: OpenAI-compatible endpoints preview
+# dstack 0.14.0: OpenAI-compatible endpoints preview
 
 __Making it easier to deploy custom LLMs as OpenAI-compatible endpoints.__
 
 The `service` configuration deploys any application as a public endpoint. For instance, you can use HuggingFace's 
-[TGI](https://github.com/huggingface/text-generation-inference) or frameworks to deploy 
-custom LLMs. While this is simple and customizable, using different frameworks and LLMs complicates 
-the integration of LLMs.
+[TGI](https://github.com/huggingface/text-generation-inference) or other frameworks to deploy custom LLMs. 
+While this is simple and customizable, using different frameworks and LLMs complicates the integration of LLMs.
 
 <!-- more -->
 
-With the upcoming `dstack 0.14.0`, we are extending the `service` configuration in `dstack` to enable you to optionally map your
+With `dstack 0.14.0`, we are extending the `service` configuration in `dstack` to enable you to optionally map your
 custom LLM to an OpenAI-compatible endpoint.
 
 Here's how it works: you define a `service` (as before) and include the `model` property with 
@@ -42,7 +41,7 @@ model:
   format: tgi
 ```
 
-When you deploy this service using `dstack run`, `dstack` will automatically publish the OpenAI-compatible endpoint,
+When you deploy the service using `dstack run`, `dstack` will automatically publish the OpenAI-compatible endpoint,
 converting the prompt and response format between your LLM and OpenAI interface.
 
 ```python
@@ -63,32 +62,20 @@ completion = client.chat.completions.create(
 print(completion.choices[0].message)
 ```
 
-!!! info "NOTE:"
-    By default, dstack loads the model's `chat_template` and `eos_token` from Hugging Face. However, you can override them using
-    the corresponding properties under `model`.
-
 Here's a live demo of how it works:
 
 <img src="https://raw.githubusercontent.com/dstackai/static-assets/main/static-assets/images/dstack-openai-python.gif" style="width:100%; max-width: 800px;" />
 
-## Try the preview
-
-To try the preview of this new upcoming feature, make sure to install `0.14.0rc1` and restart your server.
-
-```shell
-pip install "dstack[all]==0.14.0rc1"
-```
+For more details on how to use the new feature, be sure to check the updated documentation on [services](../../docs/concepts/services.md),
+and the [TGI](../../examples/tgi.md) example.
 
 ## Migration guide
 
-Note: In order to use the new feature, it's important to delete your existing gateway (if any)
-using `dstack gateway delete` and then create it again with `dstack gateway create`.
-
-## Why does this matter?
-
-With `dstack`, you can train and deploy models using any cloud providers, easily leveraging GPU availability across
-providers, spot instances, multiple regions, and more.
+Note: After you update to `0.14.0`, it's important to delete your existing gateways (if any)
+using `dstack gateway delete` and create them again with `dstack gateway create`.
 
 ## Feedback
 
-Do you have any questions or need assistance? Feel free to join our [Discord server](https://discord.gg/u8SmfwPpMd).
\ No newline at end of file
+In case you have any questions, experience bugs, or need help, 
+drop us a message on our [Discord server](https://discord.gg/u8SmfwPpMd) or submit it as a 
+[GitHub issue](https://github.com/dstackai/dstack/issues/new/choose).
\ No newline at end of file
diff --git a/docs/blog/posts/simplified-cloud-setup.md b/docs/blog/posts/simplified-cloud-setup.md
index 82ae2f32d..09fb5cf3d 100644
--- a/docs/blog/posts/simplified-cloud-setup.md
+++ b/docs/blog/posts/simplified-cloud-setup.md
@@ -39,7 +39,7 @@ projects:
 </div>
 
 Regions and other settings are optional. Learn more on what credential types are supported 
-via [Clouds](../../docs/config/server.md).
+via [Clouds](../../docs/installation/index.md).
 
 ## Enhanced API
 
@@ -97,7 +97,7 @@ This means you'll need to delete `~/.dstack` and configure `dstack` from scratch
 
 1. `pip install "dstack[all]==0.12.0"`
 2. Delete `~/.dstack`
-3. Configure clouds via `~/.dstack/server/config.yml` (see the [new guide](../../docs/config/server.md))
+3. Configure clouds via `~/.dstack/server/config.yml` (see the [new guide](../../docs/installation/index.md))
 4. Run `dstack server`
 
 The [documentation](../../docs/index.md) and [examples](../../examples/index.md) are updated.
diff --git a/docs/docs/concepts/services.md b/docs/docs/concepts/services.md
index 62fd046c6..d9d87962a 100644
--- a/docs/docs/concepts/services.md
+++ b/docs/docs/concepts/services.md
@@ -1,9 +1,7 @@
 # Services
 
-With `dstack`, you can use the CLI or API to deploy models or web apps.
-Provide the commands, port, and choose the Python version or a Docker image.
-
-`dstack` handles the deployment on configured cloud GPU provider(s) with the necessary resources.
+Services make it easy to deploy models and apps as public endpoints, allowing you to use any
+frameworks.
 
 ??? info "Prerequisites"
 
@@ -31,14 +29,15 @@ Provide the commands, port, and choose the Python version or a Docker image.
     Afterward, in your domain's DNS settings, add an `A` DNS record for `*.example.com` 
     pointing to the IP address of the gateway.
     
-    This way, if you run a service, `dstack` will make its endpoint available at 
-    `https://<run-name>.example.com`.
+    Now, if you run a service, `dstack` will make its endpoint available at 
+    `https://<run name>.<gateway domain>`.
 
-If you're using the cloud version of `dstack`, the gateway is set up for you.
+    In case your service has the [model mapping](#model-mapping) configured, `dstack` will 
+    automatically make your model available at `https://gateway.<gateway domain>` via the OpenAI-compatible interface.
 
-## Using the CLI
+If you're using the cloud version of `dstack`, the gateway is set up for you.
 
-### Define a configuration
+## Define a configuration
 
 First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or `train.dstack.yml`
 are both acceptable).
@@ -49,27 +48,86 @@ are both acceptable).
 type: service
 
 image: ghcr.io/huggingface/text-generation-inference:latest
+env:
+  - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1
+port: 80
+commands:
+  - text-generation-launcher --port 80 --trust-remote-code
+```
 
-env: 
-  - MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ 
+</div>
 
-port: 80
+The `image` property is optional. If not specified, `dstack` uses its own Docker image, 
+pre-configured with Python, Conda, and essential CUDA drivers.
+
+If you run such a configuration, once the service is up, you'll be able to 
+access it at `https://<run name>.<gateway domain>` (see how to [set up a gateway](#set-up-a-gateway)).
+
+!!! info "Configuration options"
+    Configuration file allows you to specify a custom Docker image, environment variables, and many other
+    options. For more details, refer to the [Reference](../reference/dstack.yml.md#service).
+
+### Model mapping
+
+If your service is running a model, you can configure the model mapping to be able to access it via the
+OpenAI-compatible interface.
 
+<div editor-title="serve.dstack.yml"> 
+
+```yaml
+type: service
+
+image: ghcr.io/huggingface/text-generation-inference:latest
+env:
+  - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1
+port: 80
 commands:
-  - text-generation-launcher --hostname 0.0.0.0 --port 80 --trust-remote-code
+  - text-generation-launcher --port 80 --trust-remote-code
+  
+model:
+  type: chat
+  name: mistralai/Mistral-7B-Instruct-v0.1
+  format: tgi
 ```
 
 </div>
 
-By default, `dstack` uses its own Docker images to run dev environments, 
-which are pre-configured with Python, Conda, and essential CUDA drivers.
+In this case, with such a configuration, once the service is up, you'll be able to access the model at
+`https://gateway.<gateway domain>` via the OpenAI-compatible interface.
 
-!!! info "Configuration options"
-    Configuration file allows you to specify a custom Docker image, environment variables, and many other 
-    options.
-    For more details, refer to the [Reference](../reference/dstack.yml.md#service).
+#### Chat template
+
+By default, `dstack` loads the [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) 
+from the model's repository. If it is not present there, manual configuration is required.
+
+```yaml
+type: service
 
-### Run the configuration
+image: ghcr.io/huggingface/text-generation-inference:latest
+env:
+  - MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ
+port: 80
+commands:
+  - text-generation-launcher --port 80 --trust-remote-code --quantize gptq
+
+model:
+  type: chat
+  name: TheBloke/Llama-2-13B-chat-GPTQ
+  format: tgi
+  chat_template: "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' '  + content.strip() + ' </s>' }}{% endif %}{% endfor %}"
+  eos_token: "</s>"
+```
+
+??? info "Limitations"
+    Note that model mapping is an experimental feature, and it has the following limitations:
+    
+    1. Doesn't work if your `chat_template` uses `bos_token`. As a workaround, replace `bos_token` inside `chat_template` with the token content itself.
+    2. Doesn't work if `eos_token` is defined in the model repository as a dictionary. As a workaround, set `eos_token` manually, as shown in the example above (see Chat template).
+    3. Only works if you're using Text Generation Inference. Support for vLLM and other serving frameworks is coming later.
+ 
+    If you encounter any other issues, please make sure to file a [GitHub issue](https://github.com/dstackai/dstack/issues/new/choose).
+
+## Run the configuration
 
 To run a configuration, use the `dstack run` command followed by the working directory path, 
 configuration file path, and any other options (e.g., for requesting hardware resources).
@@ -89,22 +147,57 @@ Continue? [y/n]: y
 Provisioning...
 ---> 100%
 
-Serving HTTP on https://yellow-cat-1.example.com ...
+Service is published at https://yellow-cat-1.example.com
 ```
 
 </div>
 
-Once the service is deployed, its endpoint will be available at
-`https://<run-name>.<domain-name>` (using the domain [set up for the gateway](#set-up-a-gateway)).
-
 !!! info "Run options"
     The `dstack run` command allows you to use `--gpu` to request GPUs (e.g. `--gpu A100` or `--gpu 80GB` or `--gpu A100:4`, etc.),
     and many other options (incl. spot instances, disk size, max price, max duration, retry policy, etc.).
     For more details, refer to the [Reference](../reference/cli/index.md#dstack-run).
 
-[//]: # (TODO: Example)
+### Service endpoint
+
+Once the service is up, you'll be able to 
+access it at `https://<run name>.<gateway domain>`.
+
+<div class="termy">
+
+```shell
+$ curl https://yellow-cat-1.example.com/generate \
+    -X POST \
+    -d '{"inputs":"&lt;s&gt;[INST] What is your favourite condiment?[/INST]"}' \
+    -H 'Content-Type: application/json'
+```
+
+</div>
+
+#### OpenAI interface
+
+In case the service has the [model mapping](#model-mapping) configured, you will also be able 
+to access the model at `https://gateway.<gateway domain>` via the OpenAI-compatible interface.
+
+```python
+from openai import OpenAI
+
+
+client = OpenAI(
+  base_url="https://gateway.example.com",
+  api_key="none"
+)
+
+completion = client.chat.completions.create(
+  model="mistralai/Mistral-7B-Instruct-v0.1",
+  messages=[
+    {"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
+  ]
+)
+
+print(completion.choices[0].message)
+```
 
-What's next?
+## What's next?
 
 1. Check the [Text Generation Inference](../../examples/tgi.md) and [vLLM](../../examples/vllm.md) examples
 2. Read about [dev environments](../concepts/dev-environments.md) 
diff --git a/docs/docs/quickstart.md b/docs/docs/quickstart.md
index b307e40a2..9884487e0 100644
--- a/docs/docs/quickstart.md
+++ b/docs/docs/quickstart.md
@@ -29,8 +29,7 @@ or `train.dstack.yml` are both acceptable).
     ```yaml
     type: dev-environment
 
-    python: "3.11" # (Optional) If not specified, your local version is used
-    
+    python: "3.11"
     ide: vscode
     ```
 
@@ -45,16 +44,17 @@ or `train.dstack.yml` are both acceptable).
     ```yaml
     type: task
 
-    python: "3.11" # (Optional) If not specified, your local version is used
-    
+    python: "3.11"
+    env:
+      - HF_HUB_ENABLE_HF_TRANSFER=1
     commands:
-      - pip install -r requirements.txt
-      - python train.py
+      - pip install -r fine-tuning/qlora/requirements.txt
+      - python fine-tuning/qlora/train.py
     ```
 
     </div>
 
-    Ensure `requirements.txt` and `train.py` are in your folder; you can take them from our [`examples`](https://github.com/dstackai/dstack-examples/tree/main/fine-tuning/qlora).
+    Ensure `requirements.txt` and `train.py` are in your folder. You can take them from [`dstack-examples`](https://github.com/dstackai/dstack-examples/tree/main/fine-tuning/qlora).
 
 === "Service"
 
@@ -66,22 +66,19 @@ or `train.dstack.yml` are both acceptable).
     type: service
 
     image: ghcr.io/huggingface/text-generation-inference:latest
-    
-    env: 
-      - MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ 
-    
+    env:
+      - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1
     port: 80
-    
     commands:
-      - text-generation-launcher --hostname 0.0.0.0 --port 80 --trust-remote-code
+      - text-generation-launcher --port 80 --trust-remote-code
     ```
-    
+
     </div>
 
 ## Run configuration
 
 Run a configuration using the [`dstack run`](reference/cli/index.md#dstack-run) command, followed by the working directory path (e.g., `.`), the path to the
-configuration file, and  run options (e.g., configuring hardware resources, spot policy, etc.)
+configuration file, and run options (e.g., configuring hardware resources, spot policy, etc.)
 
 <div class="termy">
 
diff --git a/docs/examples/deploy-python.md b/docs/examples/deploy-python.md
index ffb32f279..d0720f26d 100644
--- a/docs/examples/deploy-python.md
+++ b/docs/examples/deploy-python.md
@@ -139,7 +139,7 @@ run.refresh()
 
 ## Source code
     
-The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).
+The complete, ready-to-run code is available in [`dstackai/dstack-examples`](https://github.com/dstackai/dstack-examples).
 
 ```shell
 git clone https://github.com/dstackai/dstack-examples
diff --git a/docs/examples/llama-index.md b/docs/examples/llama-index.md
index ace8dcff2..24ebc928a 100644
--- a/docs/examples/llama-index.md
+++ b/docs/examples/llama-index.md
@@ -98,7 +98,7 @@ The data is in the vector database! Now we can proceed with the part where we in
 This example assumes we're using an LLM deployed using [TGI](tgi.md).
 
 Once you deployed the model, make sure to set the `TGI_ENDPOINT_URL` environment variable 
-to its URL, e.g. `https://<run-name>.<domain-name>` (or `http://localhost:<port>` if it's deployed 
+to its URL, e.g. `https://<run name>.<gateway domain>` (or `http://localhost:<port>` if it's deployed 
 as a task). We'll use this environment variable below.
 
 <div class="termy">
@@ -214,7 +214,7 @@ using `dstack`. For more in-depth information, we encourage you to explore the d
 
 ## Source code
 
-The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).
+The complete, ready-to-run code is available in [`dstackai/dstack-examples`](https://github.com/dstackai/dstack-examples).
 
 ## What's next?
 
diff --git a/docs/examples/mixtral.md b/docs/examples/mixtral.md
index 4868e3c11..9ade1e0ba 100644
--- a/docs/examples/mixtral.md
+++ b/docs/examples/mixtral.md
@@ -8,79 +8,85 @@ with `dstack`'s [services](../docs/concepts/services.md).
 To deploy Mixtral as a service, you have to define the corresponding configuration file.
 Below are multiple variants: via vLLM (`fp16`), TGI (`fp16`), or TGI (`int4`).
 
-=== "vLLM `fp16`"
+=== "TGI `fp16`"
 
-    <div editor-title="llms/mixtral/vllm.dstack.yml"> 
+    <div editor-title="llms/mixtral/tgi.dstack.yml"> 
 
     ```yaml
     type: service
-    # This configuration deploys Mixtral in fp16 using vLLM
-    
-    python: "3.11"
     
+    image: ghcr.io/huggingface/text-generation-inference:latest
+    env:
+      - MODEL_ID=mistralai/Mixtral-8x7B-Instruct-v0.1
     commands:
-      - pip install vllm
-      - python -m vllm.entrypoints.openai.api_server
-        --model mistralai/Mixtral-8X7B-Instruct-v0.1
-        --host 0.0.0.0
-        --tensor-parallel-size 2 # Should match the number of GPUs
-    
-    port: 8000
+      - text-generation-launcher 
+        --port 80
+        --trust-remote-code
+        --num-shard 2 # Should match the number of GPUs 
+    port: 80
+
+    # Optional mapping for OpenAI-compatible endpoint
+    model:
+      type: chat
+      name: TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ
+      format: tgi
     ```
 
     </div>
 
-=== "TGI `fp16`"
+=== "TGI `int4`"
 
-    <div editor-title="llms/mixtral/tgi.dstack.yml"> 
+    <div editor-title="llms/mixtral/tgi-gptq.dstack.yml"> 
 
     ```yaml
     type: service
-    # This configuration deploys Mixtral in fp16 using TGI
-    
-    image: ghcr.io/huggingface/text-generation-inference:latest
-    
+
+    image: ghcr.io/huggingface/text-generation-inference:latest 
     env:
-      - MODEL_ID=mistralai/Mixtral-8x7B-Instruct-v0.1
-    
+      - MODEL_ID=TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ 
     commands:
-      - text-generation-launcher 
-        --hostname 0.0.0.0 
-        --port 8000 
+      - text-generation-launcher
+        --port 80
         --trust-remote-code
-        --num-shard 2 # Should match the number of GPUs
-    
-    port: 8000
+        --quantize gptq
+    port: 80
+
+    # Optional mapping for OpenAI-compatible endpoint
+    model:
+      type: chat
+      name: TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ
+      format: tgi
     ```
 
     </div>
 
-=== "TGI `int4`"
+=== "vLLM `fp16`"
 
-    <div editor-title="llms/mixtral/tgi-gptq.dstack.yml"> 
+    <div editor-title="llms/mixtral/vllm.dstack.yml"> 
 
     ```yaml
     type: service
-    # This configuration deploys Mixtral in int4 using TGI
-    
-    image: ghcr.io/huggingface/text-generation-inference:latest
+    # This configuration deploys Mixtral in fp16 using vLLM
     
-    env:
-      - MODEL_ID=TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ
+    python: "3.11"
     
     commands:
-      - text-generation-launcher 
-        --hostname 0.0.0.0 
-        --port 8000 
-        --trust-remote-code 
-        --quantize gptq
+      - pip install vllm
+      - python -m vllm.entrypoints.openai.api_server
+        --model mistralai/Mixtral-8X7B-Instruct-v0.1
+        --host 0.0.0.0
+        --tensor-parallel-size 2 # Should match the number of GPUs
     
     port: 8000
     ```
 
     </div>
 
-> vLLM's support for quantized Mixtral is not yet stable. 
+    !!! info "NOTE:"
+        The [model mapping](../docs/concepts/services.md#model-mapping) to access the model via the 
+        gateway's OpenAI-compatible endpoint is not yet supported for vLLM.
+
+        Also, support for quantized Mixtral in vLLM is not yet stable.
 
 ## Run the configuration
 
@@ -88,69 +94,72 @@ Below are multiple variants: via vLLM (`fp16`), TGI (`fp16`), or TGI (`int4`).
     Before running a service, make sure to set up a [gateway](../docs/concepts/services.md#set-up-a-gateway).
     However, it's not required when using dstack Cloud, as it's set up automatically.
 
-!!! info "Resources"
-    For `fp16`, deployment of Mixtral, ensure a minimum total GPU memory of `100GB` and disk size of `200GB`.
-    Also, make sure to adjust the `--tensor-parallel-size` and `--num-shard` parameters in the YAML configuration to align
-    with the number of GPUs used.
-    For `int4`, request at least `25GB` of GPU memory.
+For `fp16`, deployment of Mixtral, ensure a minimum total GPU memory of `100GB` and disk size of `200GB`.
+For `int4`, request at least `25GB` of GPU memory.
 
-=== "vLLM `fp16`"
+[//]: # (    Also, make sure to adjust the `--tensor-parallel-size` and `--num-shard` parameters in the YAML configuration to align)
+[//]: # (    with the number of GPUs used.)
+    
+
+=== "TGI `fp16`"
 
     <div class="termy">
     
     ```shell
-    $ dstack run . -f llms/mixtral/vllm.dstack.yml --gpu "80GB:2" --disk 200GB
+    $ dstack run . -f llms/mixtral/tgi.dstack.yml --gpu "80GB:2" --disk 200GB
     ```
     
     </div>
 
-=== "TGI `fp16`"
+=== "TGI `int4`"
 
     <div class="termy">
     
     ```shell
-    $ dstack run . -f llms/mixtral/tgi.dstack.yml --gpu "80GB:2" --disk 200GB
+    $ dstack run . -f llms/mixtral/tgi-gptq.dstack.yml --gpu 25GB
     ```
     
     </div>
 
-=== "TGI `int4`"
+=== "vLLM `fp16`"
 
     <div class="termy">
     
     ```shell
-    $ dstack run . -f llms/mixtral/tgi-gptq.dstack.yml --gpu 25GB
+    $ dstack run . -f llms/mixtral/vllm.dstack.yml --gpu "80GB:2" --disk 200GB
     ```
     
     </div>
 
-!!! info "Endpoint URL"
-    Once the service is deployed, its endpoint will be available at 
-    `https://<run-name>.<domain-name>` (using the domain set up for the gateway).
-
-    If you wish to customize the run name, you can use the `-n` argument with the `dstack run` command.
-
-[//]: # (Once the service is up, you can query it via it's OpenAI compatible endpoint:)
-[//]: # (<div class="termy">)
-[//]: # ()
-[//]: # (```shell)
-[//]: # ($ curl -X POST --location https://yellow-cat-1.mydomain.com/v1/completions \)
-[//]: # (    -H "Content-Type: application/json" \)
-[//]: # (    -d '{)
-[//]: # (          "model": "mistralai/Mixtral-8X7B-Instruct-v0.1",)
-[//]: # (          "prompt": "Hello!",)
-[//]: # (          "max_tokens": 25,)
-[//]: # (        }')
-[//]: # (```)
-[//]: # ()
-[//]: # (</div>)
-
-[//]: # (!!! info "OpenAI-compatible API")
-[//]: # (    Since vLLM provides an OpenAI-compatible endpoint, feel free to access it using various OpenAI-compatible tools like)
-[//]: # (    Chat UI, LangChain, Llama Index, etc. )
+## Access the endpoint
+
+Once the service is up, you'll be able to access it at `https://<run name>.<gateway domain>`.
+
+#### OpenAI interface
+
+In case the service has the [model mapping](../docs/concepts/services.md#model-mapping) configured, you will also be able 
+to access the model at `https://gateway.<gateway domain>` via the OpenAI-compatible interface.
+
+```python
+from openai import OpenAI
 
-??? info "Hugging Face Hub token"
 
+client = OpenAI(
+  base_url="https://gateway.example.com",
+  api_key="none"
+)
+
+completion = client.chat.completions.create(
+  model="mistralai/Mixtral-8x7B-Instruct-v0.1",
+  messages=[
+    {"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
+  ]
+)
+
+print(completion.choices[0].message)
+```
+
+??? info "Hugging Face Hub token"
     To use a model with gated access, ensure configuring the `HUGGING_FACE_HUB_TOKEN` environment variable 
     (with [`--env`](../docs/reference/cli/index.md#dstack-run) in `dstack run` or 
     using [`env`](../docs/reference/dstack.yml.md#service) in the configuration file).
@@ -164,11 +173,11 @@ Below are multiple variants: via vLLM (`fp16`), TGI (`fp16`), or TGI (`int4`).
 
 ## Source code
     
-The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).
+The complete, ready-to-run code is available in [`dstackai/dstack-examples`](https://github.com/dstackai/dstack-examples).
 
 ## What's next?
 
-1. Check the [vLLM](tgi.md) and [Text Generation Inference](tgi.md) examples
+1. Check the [Text Generation Inference](tgi.md) and [vLLM](vllm.md) examples
 2. Read about [services](../docs/concepts/services.md)
 3. Browse [examples](index.md)
 4. Join the [Discord server](https://discord.gg/u8SmfwPpMd)
\ No newline at end of file
diff --git a/docs/examples/sdxl.md b/docs/examples/sdxl.md
index bf3991662..acc55b9e0 100644
--- a/docs/examples/sdxl.md
+++ b/docs/examples/sdxl.md
@@ -204,13 +204,9 @@ $ dstack run . -f stable-diffusion-xl/api.dstack.yml
 
 </div>
 
-!!! info "Endpoint URL"
-    Once the service is deployed, its endpoint will be available at 
-    `https://<run-name>.<domain-name>` (using the domain set up for the gateway).
-
-    If you wish to customize the run name, you can use the `-n` argument with the `dstack run` command.
-
-Once the service is up, you can query the endpoint:
+## Access the endpoint
+Once the service is up, you can query it at 
+`https://<run name>.<gateway domain>` (using the domain set up for the gateway):
 
 <div class="termy">
 
@@ -224,7 +220,7 @@ $ curl -X POST --location https://yellow-cat-1.mydomain.com/generate \
 
 ## Source code
     
-The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).
+The complete, ready-to-run code is available in [`dstackai/dstack-examples`](https://github.com/dstackai/dstack-examples).
 
 ## What's next?
 
diff --git a/docs/examples/spot.md b/docs/examples/spot.md
index 05b433664..57d9320b8 100644
--- a/docs/examples/spot.md
+++ b/docs/examples/spot.md
@@ -4,7 +4,7 @@ Cloud instances come in three types: `reserved` (for long-term commitments at a
 more expensive), and `spot` (cheapest, provided when available, but can be taken away when requested by someone else).
 
 There are three cloud providers that offer spot instances: AWS, GCP, and Azure. 
-Once you've [configured](../docs/config/server.md) any of these, you can use spot instances 
+Once you've [configured](../docs/installation/index.md) any of these, you can use spot instances 
 for [dev environments](../docs/concepts/dev-environments.md), [tasks](../docs/concepts/tasks.md), and 
 [services](../docs/concepts/services.md).
 
diff --git a/docs/examples/tei.md b/docs/examples/tei.md
index a306a20e9..55a48da6c 100644
--- a/docs/examples/tei.md
+++ b/docs/examples/tei.md
@@ -7,21 +7,17 @@ This example demonstrates how to use [TEI](https://github.com/huggingface/text-e
 
 To deploy a text embeddings model as a service using TEI, define the following configuration file:
 
-<div editor-title="text-embeddings-inference/serve.dstack.yml"> 
+<div editor-title="deployment/tae/serve.dstack.yml"> 
 
 ```yaml
 type: service
 
 image: ghcr.io/huggingface/text-embeddings-inference:latest
-
 env:
   - MODEL_ID=thenlper/gte-base
-
-port: 8000
-
 commands: 
-  - text-embeddings-router --hostname 0.0.0.0 --port 8000
-
+  - text-embeddings-router --port 80
+port: 80
 ```
 
 </div>
@@ -35,23 +31,20 @@ commands:
 <div class="termy">
 
 ```shell
-$ dstack run . -f text-embeddings-inference/embeddings.dstack.yml --gpu 24GB
+$ dstack run . -f deployment/tae/serve.dstack.yml --gpu 24GB
 ```
 
 </div>
 
-!!! info "Endpoint URL"
-    Once the service is deployed, its endpoint will be available at 
-    `https://<run-name>.<domain-name>` (using the domain set up for the gateway).
-
-    If you wish to customize the run name, you can use the `-n` argument with the `dstack run` command.
-
-Once the service is up, you can query it:
+## Access the endpoint
+    
+Once the service is up, you can query it at 
+`https://<run name>.<gatewy domain>` (using the domain set up for the gateway):
 
 <div class="termy">
 
 ```shell
-$ curl https://yellow-cat-1.mydomain.com \
+$ curl https://yellow-cat-1.example.com \
     -X POST \
     -H 'Content-Type: application/json' \
     -d '{"inputs":"What is Deep Learning?"}'
@@ -100,7 +93,7 @@ $ curl https://yellow-cat-1.mydomain.com \
 [//]: # ()
 [//]: # (# Specify your service url)
 
-[//]: # (EMBEDDINGS_URL = "https://tall-octopus-1.mydomain.com")
+[//]: # (EMBEDDINGS_URL = "https://tall-octopus-1.example.com")
 
 [//]: # ()
 [//]: # (embedding=HuggingFaceInferenceAPIEmbeddings&#40;)
@@ -231,5 +224,13 @@ $ curl https://yellow-cat-1.mydomain.com \
 
 [//]: # (    you'll have to split your texts into batches and add them to vector store via `vectorstore.add_texts&#40;&#41;`.)
 
-!!! info "Source code"
-    The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).
\ No newline at end of file
+## Source code
+
+The complete, ready-to-run code is available in [`dstackai/dstack-examples`](https://github.com/dstackai/dstack-examples).
+
+## What's next?
+
+1. Check the [Text Generation Inference](tgi.md) and [vLLM](vllm.md) examples
+2. Read about [services](../docs/concepts/services.md)
+3. Browse all [examples](index.md)
+4. Join the [Discord server](https://discord.gg/u8SmfwPpMd)
\ No newline at end of file
diff --git a/docs/examples/tgi.md b/docs/examples/tgi.md
index d715470a6..854a485ce 100644
--- a/docs/examples/tgi.md
+++ b/docs/examples/tgi.md
@@ -12,18 +12,26 @@ To deploy an LLM as a service using TGI, you have to define the following config
 type: service
 
 image: ghcr.io/huggingface/text-generation-inference:latest
-
 env:
-  - MODEL_ID=NousResearch/Llama-2-7b-hf
-
-port: 8000
-
-commands: 
-  - text-generation-launcher --hostname 0.0.0.0 --port 8000 --trust-remote-code
+  - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1
+port: 80
+commands:
+  - text-generation-launcher --port 80 --trust-remote-code
+  
+# Optional mapping for OpenAI interface
+model:
+  type: chat
+  name: mistralai/Mistral-7B-Instruct-v0.1
+  format: tgi
 ```
 
 </div>
 
+!!! info "Model mapping"
+    Note the `model` property is optional and is only required
+    if you're running a chat model and want to access it via an OpenAI-compatible endpoint.
+    For more details on how to use it feature, check the documentation on [services](../docs/concepts/services.md).
+
 ## Run the configuration
 
 !!! warning "Gateway"
@@ -38,29 +46,46 @@ $ dstack run . -f text-generation-inference/serve.dstack.yml --gpu 24GB
 
 </div>
 
-!!! info "Endpoint URL"
-    Once the service is deployed, its endpoint will be available at 
-    `https://<run-name>.<domain-name>` (using the domain set up for the gateway).
-
-    If you wish to customize the run name, you can use the `-n` argument with the `dstack run` command.
+### Access the endpoint
 
-Once the service is up, you can query it:
+Once the service is up, you'll be able to 
+access it at `https://<run name>.<gateway domain>`.
 
 <div class="termy">
 
 ```shell
-$ curl -X POST --location https://yellow-cat-1.mydomain.com/generate \
-    -H 'Content-Type: application/json' \
-    -d '{
-          "inputs": "What is Deep Learning?",
-          "parameters": {
-            "max_new_tokens": 20
-          }
-        }'
+$ curl https://yellow-cat-1.example.com/generate \
+    -X POST \
+    -d '{"inputs":"&lt;s&gt;[INST] What is your favourite condiment?[/INST]"}' \
+    -H 'Content-Type: application/json'
 ```
 
 </div>
 
+#### OpenAI interface
+
+Because we've configured the model mapping, it will also be possible 
+to access the model at `https://gateway.<gateway domain>` via the OpenAI-compatible interface.
+
+```python
+from openai import OpenAI
+
+
+client = OpenAI(
+  base_url="https://gateway.example.com",
+  api_key="none"
+)
+
+completion = client.chat.completions.create(
+  model="mistralai/Mistral-7B-Instruct-v0.1",
+  messages=[
+    {"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
+  ]
+)
+
+print(completion.choices[0].message)
+```
+
 !!! info "Hugging Face Hub token"
 
     To use a model with gated access, ensure configuring the `HUGGING_FACE_HUB_TOKEN` environment variable 
@@ -76,13 +101,9 @@ $ curl -X POST --location https://yellow-cat-1.mydomain.com/generate \
     ```
     </div>
 
-### Quantization
+## Quantization
 
-An LLM typically requires twice the GPU memory compared to its parameter count. For instance, a model with `13B` parameters
-needs around `26GB` of GPU memory. To decrease memory usage and fit the model on a smaller GPU, consider using
-quantization, which TGI offers as `bitsandbytes` and `gptq` methods. 
-
-Here's an example of the Llama 2 13B model tailored for a `24GB` GPU (A10 or L4):
+Here's an example of using TGI with quantization:
 
 <div editor-title="text-generation-inference/serve.dstack.yml"> 
 
@@ -90,26 +111,25 @@ Here's an example of the Llama 2 13B model tailored for a `24GB` GPU (A10 or L4)
 type: service
 
 image: ghcr.io/huggingface/text-generation-inference:latest
-
 env:
-  - MODEL_ID=TheBloke/Llama-2-13B-GPTQ
-
-port: 8000
-
-commands: 
-  - text-generation-launcher --hostname 0.0.0.0 --port 8000 --trust-remote-code --quantize gptq
+  - MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ
+port: 80
+commands:
+  - text-generation-launcher --port 80 --trust-remote-code --quantize gptq
+
+model:
+  type: chat
+  name: TheBloke/Llama-2-13B-chat-GPTQ
+  format: tgi
+  chat_template: "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' '  + content.strip() + ' </s>' }}{% endif %}{% endfor %}"
+  eos_token: "</s>"
 ```
 
 </div>
 
-A similar approach allows running the Llama 2 70B model on an `40GB` GPU (A100).
-
-To calculate the exact GPU memory required for a specific model with different quantization methods, you can use the
-[hf-accelerate/memory-model-usage](https://huggingface.co/spaces/hf-accelerate/model-memory-usage) Space.
-
 ## Source code
     
-The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).
+The complete, ready-to-run code is available in [`dstackai/dstack-examples`](https://github.com/dstackai/dstack-examples).
 
 ## What's next?
 
diff --git a/docs/examples/vllm.md b/docs/examples/vllm.md
index ce3048f44..1806740ef 100644
--- a/docs/examples/vllm.md
+++ b/docs/examples/vllm.md
@@ -12,15 +12,12 @@ To deploy an LLM as a service using vLLM, you have to define the following confi
 type: service
 
 python: "3.11"
-
 env:
   - MODEL=NousResearch/Llama-2-7b-hf
-
-port: 8000
-
 commands:
   - pip install vllm
   - python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
+port: 8000
 ```
 
 </div>
@@ -39,18 +36,15 @@ $ dstack run . -f vllm/serve.dstack.yml --gpu 24GB
 
 </div>
 
-!!! info "Endpoint URL"
-    Once the service is deployed, its endpoint will be available at 
-    `https://<run-name>.<domain-name>` (using the domain set up for the gateway).
-
-    If you wish to customize the run name, you can use the `-n` argument with the `dstack run` command.
+## Access the endpoint
 
-Once the service is up, you can query it:
+Once the service is up, you can query it at 
+`https://<run name>.<gateway domain>` (using the domain set up for the gateway):
 
 <div class="termy">
 
 ```shell
-$ curl -X POST --location https://yellow-cat-1.mydomain.com/v1/completions \
+$ curl -X POST --location https://yellow-cat-1.example.com/v1/completions \
     -H "Content-Type: application/json" \
     -d '{
           "model": "NousResearch/Llama-2-7b-hf",
@@ -77,7 +71,7 @@ $ curl -X POST --location https://yellow-cat-1.mydomain.com/v1/completions \
 
 ## Source code
     
-The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).
+The complete, ready-to-run code is available in [`dstackai/dstack-examples`](https://github.com/dstackai/dstack-examples).
 
 ## What's next?
 
diff --git a/docs/overrides/home.html b/docs/overrides/home.html
index 3dfd30549..11c5aeedd 100644
--- a/docs/overrides/home.html
+++ b/docs/overrides/home.html
@@ -253,7 +253,7 @@ <h2>Deployment</h2>
                 </div>
 
                 <div class="block large">
-                    <img src="https://raw.githubusercontent.com/dstackai/static-assets/main/static-assets/images/dstack-service.gif" width="777" style="border-radius: 10px">
+                    <img src="https://raw.githubusercontent.com/dstackai/static-assets/main/static-assets/images/dstack-service-openai.gif" width="777" style="border-radius: 10px">
                 </div>
             </div>
         </div>
diff --git a/docs/overrides/landing.html b/docs/overrides/landing.html
index 6fe6f5b3e..b0d857b20 100644
--- a/docs/overrides/landing.html
+++ b/docs/overrides/landing.html
@@ -11,7 +11,7 @@
 {% endblock %}
 
 {% block announce %}
-🔥 dstack 0.14.0rc1 is here! Deploy custom LLMs via OpenAI-compatible endpoints! <a href="/blog/2024/01/19/openai-endpoints-preview/">Learn more</a>.
+🔥 dstack 0.14.0 is here! Deploy custom LLMs via OpenAI-compatible endpoints! <a href="/blog/2024/01/19/openai-endpoints-preview/">Learn more</a>.
 {% endblock %}
 
 {% block footer %}
diff --git a/docs/overrides/main.html b/docs/overrides/main.html
index acba75d42..fd352c170 100644
--- a/docs/overrides/main.html
+++ b/docs/overrides/main.html
@@ -24,5 +24,5 @@
 {% endblock %}
 
 {% block announce %}
-🔥 dstack 0.14.0rc1 is here! Deploy custom LLMs via OpenAI-compatible endpoints! <a href="/blog/2024/01/19/openai-endpoints-preview/">Learn more</a>.
+🔥 dstack 0.14.0 is here! Deploy custom LLMs via OpenAI-compatible endpoints! <a href="/blog/2024/01/19/openai-endpoints-preview/">Learn more</a>.
 {% endblock %}
\ No newline at end of file
diff --git a/mkdocs.yml b/mkdocs.yml
index 5b0dd717f..8073c72a8 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -81,7 +81,6 @@ plugins:
         'examples/finetuning-llama-2.md': 'examples/qlora.md'
         'examples/text-generation-inference.md': 'examples/tgi.md'
         'examples/stable-diffusion-xl.md': 'examples/sdxl.md'
-        'examples/vllm.md': 'examples/vllm.md'
         'learn/mixtral.md': 'examples/mixtral.md'
         'learn/tei.md': 'examples/tei.md'
         'learn/llama-index.md': 'examples/llama-index.md'
@@ -189,14 +188,14 @@ nav:
     - Fine-tuning:
         - QLoRA: examples/qlora.md
     - Deployment:
-        - vLLM: examples/vllm.md
         - Text Generation Inference: examples/tgi.md
+        - vLLM: examples/vllm.md
         - Text Embedding Interface: examples/tei.md
         - SDXL: examples/sdxl.md
     - RAG:
         - Llama Index: examples/llama-index.md
+  - Discord: https://discord.gg/u8SmfwPpMd
 #  - Blog:
 #      - blog/index.md
-  - Discord: https://discord.gg/u8SmfwPpMd
   - Platform: platform.md
   - GitHub: https://github.com/dstackai/dstack
\ No newline at end of file
diff --git a/src/dstack/_internal/core/models/configurations.py b/src/dstack/_internal/core/models/configurations.py
index 9db283b96..34e3dc06d 100644
--- a/src/dstack/_internal/core/models/configurations.py
+++ b/src/dstack/_internal/core/models/configurations.py
@@ -79,6 +79,17 @@ class Artifact(ForbidExtra):
 
 
 class ModelInfo(ForbidExtra):
+    """
+    Mapping of the model for the OpenAI-compatible endpoint.
+
+    Attributes:
+        type (str): The type of the model, e.g. "chat"
+        name (str): The name of the model. This name will be used both to load model configuration from the HuggingFace Hub and in the OpenAI-compatible endpoint.
+        format (str): The format of the model, e.g. "tgi" if the model is served with HuggingFace's Text Generation Inference.
+        chat_template (Optional[str]): The custom prompt template for the model. If not specified, the default prompt template the HuggingFace Hub configuration will be used.
+        eos_token (Optional[str]): The custom end of sentence token. If not specified, the default custom end of sentence token from the HuggingFace Hub configuration will be used.
+    """
+
     type: Annotated[Literal["chat"], Field(description="The type of the model")]
     name: Annotated[str, Field(description="The name of the model")]
     format: Annotated[Literal["tgi"], Field(description="The serving format")]
@@ -184,6 +195,7 @@ class ServiceConfiguration(BaseConfiguration):
         registry_auth (Optional[RegistryAuth]): Credentials for pulling a private Docker image
         home_dir (str): The absolute path to the home directory inside the container. Defaults to `/root`.
         resources (Optional[Resources]): The requirements to run the configuration.
+        model (Optional[ModelMapping]): Mapping of the model for the OpenAI-compatible endpoint.
     """
 
     type: Literal["service"] = "service"
@@ -193,7 +205,8 @@ class ServiceConfiguration(BaseConfiguration):
         Field(description="The port, that application listens to or the mapping"),
     ]
     model: Annotated[
-        Optional[ModelInfo], Field(description="The model info for OpenAI interface")
+        Optional[ModelInfo],
+        Field(description="Mapping of the model for the OpenAI-compatible endpoint"),
     ] = None
 
     @validator("port")
diff --git a/src/dstack/api/__init__.py b/src/dstack/api/__init__.py
index 5a76f2c9d..50ee3b551 100644
--- a/src/dstack/api/__init__.py
+++ b/src/dstack/api/__init__.py
@@ -1,5 +1,6 @@
 from dstack._internal.core.errors import ClientError
 from dstack._internal.core.models.backends.base import BackendType
+from dstack._internal.core.models.configurations import ModelInfo as _ModelInfo
 from dstack._internal.core.models.configurations import RegistryAuth
 from dstack._internal.core.models.configurations import (
     ServiceConfiguration as _ServiceConfiguration,
@@ -16,5 +17,6 @@
 from dstack.api._public.huggingface.finetuning.sft import FineTuningTask
 from dstack.api._public.runs import Run, RunStatus
 
+ModelMapping = _ModelInfo
 Service = _ServiceConfiguration
 Task = _TaskConfiguration