[Hardware] Initial TPU integration #5292

WoosukKwon · 2024-06-05T21:00:58Z

This PR implements the initial integration of the Google TPU backend. It uses PyTorch XLA for maximal reuse of the existing code base.

The PR features:

Seamless support for popular HF models such as Llama, Mistral, Gemma, etc. The model's head size must be either 128 or 256.
Basic functionalities of vLLM, including continuous batching
Optimized pallas kernels for FlashAttention and PagedAttention

TODOs (next steps):

(Fast) top-p sampling (disabled for now due to performance issues)
Distributed (tensor-parallel) inference
INT8 quantization
MoE
Support best_of > 1

WoosukKwon · 2024-06-10T22:24:21Z

@alanwaketan Please take a look!

JackCaoG

LGTM, thanks!

rkooo567

Mostly asking for comments for clarifying some code!

For testing, are we planning to add relevant CI in the future?

vllm/worker/tpu_worker.py

vllm/model_executor/layers/rotary_embedding.py

vllm/worker/tpu_model_runner.py

WoosukKwon · 2024-06-12T18:40:51Z

@rkooo567 Thanks for the quality review!

WoosukKwon added 30 commits April 1, 2024 02:01

Add TPU gemma

52a1e90

Add reference

86f073e

Add requirements

d148c2e

Add is_tpu

3b8f430

Add TPU to DeviceConfig

824521c

Add TPUExecutor

5083aa9

Add get_dtype_size

27c592b

Add TPU to setup

4cdb732

yapf

31d05f7

Fix RoPE output shape

46b31ed

[WIP] Add Pallas backend

02e614d

Add TPU to device config

38e3d33

Add JAX to requirements.txt

6894d3e

[WIP] Add TPU worker

d899009

Merge branch 'main' into woosuk-tpu

60ff6b8

Fix requirements

0d6402d

yapf

696b653

Fix flashattn

363e6a9

Merge branch 'main' into woosuk-tpu

d4adf92

Remove

c59c1e7

Add JAX requirements

eb0a046

Minor

6692a30

Renew TPU executor

b3b89cf

Minor

de82e95

Add torch to dependencies

6d62e4c

JAX-based TPU worker

91b47e3

Add gemma

cedb670

Fix logit indices

8888d1c

Add paged_attn op

6661c03

Minor

b25fcc0

WoosukKwon added 4 commits June 9, 2024 04:00

Remove tpu-install.sh

089476e

Fix docs

fa10ec6

Remove TODO

3d111f1

Add NotImplementedError

0e0de1c

WoosukKwon added 5 commits June 11, 2024 06:29

Enable top-p sampling

c602d78

Fix RoPE

c56d6ba

Disable top-p sampling

8820d06

Remove scheduler hack

205820d

Use enforce-eager to skip warmup

cb5e4f6

WoosukKwon changed the title ~~[WIP][Hardware] Initial TPU integration~~ [Hardware] Initial TPU integration Jun 11, 2024

WoosukKwon marked this pull request as ready for review June 11, 2024 17:47

Merge branch 'main' into torch-xla

4be5a3c

WoosukKwon requested a review from JackCaoG June 11, 2024 17:57

JackCaoG approved these changes Jun 11, 2024

View reviewed changes

rkooo567 approved these changes Jun 12, 2024

View reviewed changes

WoosukKwon added 2 commits June 12, 2024 18:03

Address comments

b4aa403

Fix for v5p

034b9bd

Add build dependencies

f5e1bf5

WoosukKwon merged commit 1a8bfd9 into main Jun 12, 2024
20 of 24 checks passed

WoosukKwon deleted the torch-xla branch June 12, 2024 18:53

WoosukKwon mentioned this pull request Jun 14, 2024

[RFC] Initial Support for Cloud TPUs #3620

Open

6 tasks

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 16, 2024

[Hardware] Initial TPU integration (vllm-project#5292)

134aadc

joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024

[Hardware] Initial TPU integration (vllm-project#5292)

89b6334

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 27, 2024

[Hardware] Initial TPU integration (vllm-project#5292)

5b00d3f

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024

[Hardware] Initial TPU integration (vllm-project#5292)

3b22eed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Hardware] Initial TPU integration (vllm-project#5292)

2e9fb8a

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

[Hardware] Initial TPU integration (vllm-project#5292)

ab26b77

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hardware] Initial TPU integration #5292

[Hardware] Initial TPU integration #5292

WoosukKwon commented Jun 5, 2024 •

edited

Loading

WoosukKwon commented Jun 10, 2024

JackCaoG left a comment

rkooo567 left a comment

WoosukKwon commented Jun 12, 2024

[Hardware] Initial TPU integration #5292

[Hardware] Initial TPU integration #5292

Conversation

WoosukKwon commented Jun 5, 2024 • edited Loading

WoosukKwon commented Jun 10, 2024

JackCaoG left a comment

Choose a reason for hiding this comment

rkooo567 left a comment

Choose a reason for hiding this comment

WoosukKwon commented Jun 12, 2024

WoosukKwon commented Jun 5, 2024 •

edited

Loading