Skip to content

0.18.11rc1

Pre-release
Pre-release
Compare
Choose a tag to compare
@peterschmidt85 peterschmidt85 released this 21 Aug 14:25
· 125 commits to master since this release
5bf3952

AMD

With the latest update, you can now specify an AMD GPU under resources. Below is an example.

type: service
name: amd-service-tgi

image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
env:
  - HUGGING_FACE_HUB_TOKEN
  - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct
  - TRUST_REMOTE_CODE=true
  - ROCM_USE_FLASH_ATTN_V2_TRITON=true
commands:
  - text-generation-launcher --port 8000
port: 8000

resources:
  gpu: MI300X
  disk: 150GB

spot_policy: auto

model:
  type: chat
  name: meta-llama/Meta-Llama-3.1-70B-Instruct
  format: openai

Note

AMD accelerators are currently supported only with the runpod backend. Support for on-prem fleets and more backends
is coming soon.

Other

New contributors

Full changelog: 0.18.10...0.18.11rc1