- [Docs] Updated the 0.14.0rc1 blog post (changed it to 0.14.0); …

…also updated the banner - [Docs] Added the OpenAI model mapping instructions into the documentation and examples (WIP)
dstackai · Jan 25, 2024 · 5b48bdf · 5b48bdf
1 parent 8f0fd42
commit 5b48bdf
Show file tree

Hide file tree

Showing 19 changed files with 349 additions and 238 deletions.
diff --git a/README.md b/README.md
@@ -29,7 +29,7 @@ Supported providers: AWS, GCP, Azure, Lambda, TensorDock, Vast.ai, and DataCrunc
 
 ## Latest news ✨
 
-- [2024/01] [dstack 0.14.0rc1: OpenAI-compatible endpoints](https://dstack.ai/blog/2024/01/19/openai-endpoints-preview/) (Preview)
+- [2024/01] [dstack 0.14.0: OpenAI-compatible endpoints preview](https://dstack.ai/blog/2024/01/19/openai-endpoints-preview/) (Release)
 - [2023/12] [dstack 0.13.0: Disk size, CUDA 12.1, Mixtral, and more](https://dstack.ai/blog/2023/12/22/disk-size-cuda-12-1-mixtral-and-more/) (Release)
 - [2023/11] [dstack 0.12.3: Vast.ai integration](https://dstack.ai/blog/2023/11/21/vastai/) (Release)
 - [2023/10] [dstack 0.12.2: TensorDock integration](https://dstack.ai/blog/2023/10/31/tensordock/) (Release)

diff --git a/docs/blog/posts/openai-endpoints-preview.md b/docs/blog/posts/openai-endpoints-preview.md
@@ -1,24 +1,23 @@
 ---
-title: "dstack 0.14.0rc1: OpenAI-compatible endpoints preview"
+title: "dstack 0.14.0: OpenAI-compatible endpoints preview"
 date: 2024-01-19
 description: "Making it easier to deploy custom LLMs as OpenAI-compatible endpoints."
 slug: "openai-endpoints-preview"
 categories:
-  - Previews
+  - Releases
 ---
 
-# dstack 0.14.0rc1: OpenAI-compatible endpoints preview
+# dstack 0.14.0: OpenAI-compatible endpoints preview
 
 __Making it easier to deploy custom LLMs as OpenAI-compatible endpoints.__
 
 The `service` configuration deploys any application as a public endpoint. For instance, you can use HuggingFace's 
-[TGI](https://github.com/huggingface/text-generation-inference) or frameworks to deploy 
-custom LLMs. While this is simple and customizable, using different frameworks and LLMs complicates 
-the integration of LLMs.
+[TGI](https://github.com/huggingface/text-generation-inference) or other frameworks to deploy custom LLMs. 
+While this is simple and customizable, using different frameworks and LLMs complicates the integration of LLMs.
 
 <!-- more -->
 
-With the upcoming `dstack 0.14.0`, we are extending the `service` configuration in `dstack` to enable you to optionally map your
+With `dstack 0.14.0`, we are extending the `service` configuration in `dstack` to enable you to optionally map your
 custom LLM to an OpenAI-compatible endpoint.
 
 Here's how it works: you define a `service` (as before) and include the `model` property with 
@@ -42,7 +41,7 @@ model:
   format: tgi
 ```
 
-When you deploy this service using `dstack run`, `dstack` will automatically publish the OpenAI-compatible endpoint,
+When you deploy the service using `dstack run`, `dstack` will automatically publish the OpenAI-compatible endpoint,
 converting the prompt and response format between your LLM and OpenAI interface.
 
 ```python
@@ -63,32 +62,20 @@ completion = client.chat.completions.create(
 print(completion.choices[0].message)
 ```
 
-!!! info "NOTE:"
-    By default, dstack loads the model's `chat_template` and `eos_token` from Hugging Face. However, you can override them using
-    the corresponding properties under `model`.
-
 Here's a live demo of how it works:
 
 <img src="https://raw.githubusercontent.com/dstackai/static-assets/main/static-assets/images/dstack-openai-python.gif" style="width:100%; max-width: 800px;" />
 
-## Try the preview
-
-To try the preview of this new upcoming feature, make sure to install `0.14.0rc1` and restart your server.
-
-```shell
-pip install "dstack[all]==0.14.0rc1"
-```
+For more details on how to use the new feature, be sure to check the updated documentation on [services](../../docs/concepts/services.md),
+and the [TGI](../../examples/tgi.md) example.
 
 ## Migration guide
 
-Note: In order to use the new feature, it's important to delete your existing gateway (if any)
-using `dstack gateway delete` and then create it again with `dstack gateway create`.
-
-## Why does this matter?
-
-With `dstack`, you can train and deploy models using any cloud providers, easily leveraging GPU availability across
-providers, spot instances, multiple regions, and more.
+Note: After you update to `0.14.0`, it's important to delete your existing gateways (if any)
+using `dstack gateway delete` and create them again with `dstack gateway create`.
 
 ## Feedback
 
-Do you have any questions or need assistance? Feel free to join our [Discord server](https://discord.gg/u8SmfwPpMd).
+In case you have any questions, experience bugs, or need help, 
+drop us a message on our [Discord server](https://discord.gg/u8SmfwPpMd) or submit it as a 
+[GitHub issue](https://github.com/dstackai/dstack/issues/new/choose).
diff --git a/docs/blog/posts/simplified-cloud-setup.md b/docs/blog/posts/simplified-cloud-setup.md
@@ -39,7 +39,7 @@ projects:
 </div>
 
 Regions and other settings are optional. Learn more on what credential types are supported 
-via [Clouds](../../docs/config/server.md).
+via [Clouds](../../docs/installation/index.md).
 
 ## Enhanced API
 
@@ -97,7 +97,7 @@ This means you'll need to delete `~/.dstack` and configure `dstack` from scratch
 
 1. `pip install "dstack[all]==0.12.0"`
 2. Delete `~/.dstack`
-3. Configure clouds via `~/.dstack/server/config.yml` (see the [new guide](../../docs/config/server.md))
+3. Configure clouds via `~/.dstack/server/config.yml` (see the [new guide](../../docs/installation/index.md))
 4. Run `dstack server`
 
 The [documentation](../../docs/index.md) and [examples](../../examples/index.md) are updated.

diff --git a/docs/docs/concepts/services.md b/docs/docs/concepts/services.md
@@ -1,9 +1,7 @@
 # Services
 
-With `dstack`, you can use the CLI or API to deploy models or web apps.
-Provide the commands, port, and choose the Python version or a Docker image.
-
-`dstack` handles the deployment on configured cloud GPU provider(s) with the necessary resources.
+Services make it easy to deploy models and apps as public endpoints, allowing you to use any
+frameworks.
 
 ??? info "Prerequisites"
 
@@ -31,14 +29,15 @@ Provide the commands, port, and choose the Python version or a Docker image.
     Afterward, in your domain's DNS settings, add an `A` DNS record for `*.example.com` 
     pointing to the IP address of the gateway.
 
-    This way, if you run a service, `dstack` will make its endpoint available at 
-    `https://<run-name>.example.com`.
+    Now, if you run a service, `dstack` will make its endpoint available at 
+    `https://<run name>.<gateway domain>`.
 
-If you're using the cloud version of `dstack`, the gateway is set up for you.
+    In case your service has the [model mapping](#model-mapping) configured, `dstack` will 
+    automatically make your model available at `https://gateway.<gateway domain>` via the OpenAI-compatible interface.
 
-## Using the CLI
+If you're using the cloud version of `dstack`, the gateway is set up for you.
 
-### Define a configuration
+## Define a configuration
 
 First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or `train.dstack.yml`
 are both acceptable).
@@ -49,27 +48,86 @@ are both acceptable).
 type: service
 
 image: ghcr.io/huggingface/text-generation-inference:latest
+env:
+  - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1
+port: 80
+commands:
+  - text-generation-launcher --port 80 --trust-remote-code
+```
 
-env: 
-  - MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ 
+</div>
 
-port: 80
+The `image` property is optional. If not specified, `dstack` uses its own Docker image, 
+pre-configured with Python, Conda, and essential CUDA drivers.
+
+If you run such a configuration, once the service is up, you'll be able to 
+access it at `https://<run name>.<gateway domain>` (see how to [set up a gateway](#set-up-a-gateway)).
+
+!!! info "Configuration options"
+    Configuration file allows you to specify a custom Docker image, environment variables, and many other
+    options. For more details, refer to the [Reference](../reference/dstack.yml.md#service).
+
+### Model mapping
+
+If your service is running a model, you can configure the model mapping to be able to access it via the
+OpenAI-compatible interface.
 
+<div editor-title="serve.dstack.yml"> 
+
+```yaml
+type: service
+
+image: ghcr.io/huggingface/text-generation-inference:latest
+env:
+  - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1
+port: 80
 commands:
-  - text-generation-launcher --hostname 0.0.0.0 --port 80 --trust-remote-code
+  - text-generation-launcher --port 80 --trust-remote-code
+  
+model:
+  type: chat
+  name: mistralai/Mistral-7B-Instruct-v0.1
+  format: tgi
 ```
 
 </div>
 
-By default, `dstack` uses its own Docker images to run dev environments, 
-which are pre-configured with Python, Conda, and essential CUDA drivers.
+In this case, with such a configuration, once the service is up, you'll be able to access the model at
+`https://gateway.<gateway domain>` via the OpenAI-compatible interface.
 
-!!! info "Configuration options"
-    Configuration file allows you to specify a custom Docker image, environment variables, and many other 
-    options.
-    For more details, refer to the [Reference](../reference/dstack.yml.md#service).
+#### Chat template
+
+By default, `dstack` loads the [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) 
+from the model's repository. If it is not present there, manual configuration is required.
+
+```yaml
+type: service
 
-### Run the configuration
+image: ghcr.io/huggingface/text-generation-inference:latest
+env:
+  - MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ
+port: 80
+commands:
+  - text-generation-launcher --port 80 --trust-remote-code --quantize gptq
+
+model:
+  type: chat
+  name: TheBloke/Llama-2-13B-chat-GPTQ
+  format: tgi
+  chat_template: "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' '  + content.strip() + ' </s>' }}{% endif %}{% endfor %}"
+  eos_token: "</s>"
+```
+
+??? info "Limitations"
+    Note that model mapping is an experimental feature, and it has the following limitations:
+
+    1. Doesn't work if your `chat_template` uses `bos_token`. As a workaround, replace `bos_token` inside `chat_template` with the token content itself.
+    2. Doesn't work if `eos_token` is defined in the model repository as a dictionary. As a workaround, set `eos_token` manually, as shown in the example above (see Chat template).
+    3. Only works if you're using Text Generation Inference. Support for vLLM and other serving frameworks is coming later.
+
+    If you encounter any other issues, please make sure to file a [GitHub issue](https://github.com/dstackai/dstack/issues/new/choose).
+
+## Run the configuration
 
 To run a configuration, use the `dstack run` command followed by the working directory path, 
 configuration file path, and any other options (e.g., for requesting hardware resources).
@@ -89,22 +147,57 @@ Continue? [y/n]: y
 Provisioning...
 ---> 100%
 
-Serving HTTP on https://yellow-cat-1.example.com ...
+Service is published at https://yellow-cat-1.example.com
 ```
 
 </div>
 
-Once the service is deployed, its endpoint will be available at
-`https://<run-name>.<domain-name>` (using the domain [set up for the gateway](#set-up-a-gateway)).
-
 !!! info "Run options"
     The `dstack run` command allows you to use `--gpu` to request GPUs (e.g. `--gpu A100` or `--gpu 80GB` or `--gpu A100:4`, etc.),
     and many other options (incl. spot instances, disk size, max price, max duration, retry policy, etc.).
     For more details, refer to the [Reference](../reference/cli/index.md#dstack-run).
 
-[//]: # (TODO: Example)
+### Service endpoint
+
+Once the service is up, you'll be able to 
+access it at `https://<run name>.<gateway domain>`.
+
+<div class="termy">
+
+```shell
+$ curl https://yellow-cat-1.example.com/generate \
+    -X POST \
+    -d '{"inputs":"&lt;s&gt;[INST] What is your favourite condiment?[/INST]"}' \
+    -H 'Content-Type: application/json'
+```
+
+</div>
+
+#### OpenAI interface
+
+In case the service has the [model mapping](#model-mapping) configured, you will also be able 
+to access the model at `https://gateway.<gateway domain>` via the OpenAI-compatible interface.
+
+```python
+from openai import OpenAI
+
+
+client = OpenAI(
+  base_url="https://gateway.example.com",
+  api_key="none"
+)
+
+completion = client.chat.completions.create(
+  model="mistralai/Mistral-7B-Instruct-v0.1",
+  messages=[
+    {"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
+  ]
+)
+
+print(completion.choices[0].message)
+```
 
-What's next?
+## What's next?
 
 1. Check the [Text Generation Inference](../../examples/tgi.md) and [vLLM](../../examples/vllm.md) examples
 2. Read about [dev environments](../concepts/dev-environments.md) 

diff --git a/docs/docs/quickstart.md b/docs/docs/quickstart.md
@@ -29,8 +29,7 @@ or `train.dstack.yml` are both acceptable).
     ```yaml
     type: dev-environment
 
-    python: "3.11" # (Optional) If not specified, your local version is used
-
+    python: "3.11"
     ide: vscode
     ```
 
@@ -45,16 +44,17 @@ or `train.dstack.yml` are both acceptable).
     ```yaml
     type: task
 
-    python: "3.11" # (Optional) If not specified, your local version is used
-
+    python: "3.11"
+    env:
+      - HF_HUB_ENABLE_HF_TRANSFER=1
     commands:
-      - pip install -r requirements.txt
-      - python train.py
+      - pip install -r fine-tuning/qlora/requirements.txt
+      - python fine-tuning/qlora/train.py
     ```
 
     </div>
 
-    Ensure `requirements.txt` and `train.py` are in your folder; you can take them from our [`examples`](https://github.com/dstackai/dstack-examples/tree/main/fine-tuning/qlora).
+    Ensure `requirements.txt` and `train.py` are in your folder. You can take them from [`dstack-examples`](https://github.com/dstackai/dstack-examples/tree/main/fine-tuning/qlora).
 
 === "Service"
 
@@ -66,22 +66,19 @@ or `train.dstack.yml` are both acceptable).
     type: service
 
     image: ghcr.io/huggingface/text-generation-inference:latest
-
-    env: 
-      - MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ 
-
+    env:
+      - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1
     port: 80
-
     commands:
-      - text-generation-launcher --hostname 0.0.0.0 --port 80 --trust-remote-code
+      - text-generation-launcher --port 80 --trust-remote-code
     ```
-    
+
     </div>
 
 ## Run configuration
 
 Run a configuration using the [`dstack run`](reference/cli/index.md#dstack-run) command, followed by the working directory path (e.g., `.`), the path to the
-configuration file, and  run options (e.g., configuring hardware resources, spot policy, etc.)
+configuration file, and run options (e.g., configuring hardware resources, spot policy, etc.)
 
 <div class="termy">
 

diff --git a/docs/examples/deploy-python.md b/docs/examples/deploy-python.md
@@ -139,7 +139,7 @@ run.refresh()
 
 ## Source code
 
-The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).
+The complete, ready-to-run code is available in [`dstackai/dstack-examples`](https://github.com/dstackai/dstack-examples).
 
 ```shell
 git clone https://github.com/dstackai/dstack-examples

diff --git a/docs/examples/llama-index.md b/docs/examples/llama-index.md
@@ -98,7 +98,7 @@ The data is in the vector database! Now we can proceed with the part where we in
 This example assumes we're using an LLM deployed using [TGI](tgi.md).
 
 Once you deployed the model, make sure to set the `TGI_ENDPOINT_URL` environment variable 
-to its URL, e.g. `https://<run-name>.<domain-name>` (or `http://localhost:<port>` if it's deployed 
+to its URL, e.g. `https://<run name>.<gateway domain>` (or `http://localhost:<port>` if it's deployed 
 as a task). We'll use this environment variable below.
 
 <div class="termy">
@@ -214,7 +214,7 @@ using `dstack`. For more in-depth information, we encourage you to explore the d
 
 ## Source code
 
-The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).
+The complete, ready-to-run code is available in [`dstackai/dstack-examples`](https://github.com/dstackai/dstack-examples).
 
 ## What's next?