Automatically set `max_num_batched_tokens` #1198

WoosukKwon · 2023-09-27T16:27:28Z

This PR removes the default value (2560) of max_num_batched_tokens and sets it based on the model's maximum length.

WoosukKwon · 2023-09-27T16:38:28Z

@zhuohan123 Sorry for the commits after requesting your review. Now the PR is ready for review!

zhuohan123

LGTM! Thanks for the fix!

WoosukKwon added 2 commits September 27, 2023 16:23

Automatically set max_num_batched_tokens

26cabdc

Minor

ca9b358

WoosukKwon requested a review from zhuohan123 September 27, 2023 16:27

WoosukKwon added 2 commits September 27, 2023 16:29

Minor

c155430

Fix dtype

4f715d6

WoosukKwon linked an issue Sep 27, 2023 that may be closed by this pull request

Automatically configure max_num_batched_tokens based on model length #1189

Closed

WoosukKwon mentioned this pull request Sep 27, 2023

[v0.2.0] Release Tracker #1089

Closed

5 tasks

zhuohan123 approved these changes Sep 27, 2023

View reviewed changes

WoosukKwon merged commit a19bc5c into main Sep 27, 2023
2 checks passed

WoosukKwon deleted the auto-max-batch branch September 27, 2023 23:34

WoosukKwon mentioned this pull request Sep 27, 2023

Automatically configure max_num_batched_tokens based on model length #1189

Closed

yunfeng-scale mentioned this pull request Oct 24, 2023

llama should have None max length scaleapi/llm-engine#348

Merged

katitizhou mentioned this pull request Nov 16, 2023

benchmark_latency.py will hang when --batchsize=1 and --n=2 #1658

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Automatically configure max_num_batched_tokens (vllm-project#1198)

69decbd

sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024

Automatically configure max_num_batched_tokens (vllm-project#1198)

35a642a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically set `max_num_batched_tokens` #1198

Automatically set `max_num_batched_tokens` #1198

WoosukKwon commented Sep 27, 2023

WoosukKwon commented Sep 27, 2023

zhuohan123 left a comment

Automatically set max_num_batched_tokens #1198

Automatically set max_num_batched_tokens #1198

Conversation

WoosukKwon commented Sep 27, 2023

WoosukKwon commented Sep 27, 2023

zhuohan123 left a comment

Choose a reason for hiding this comment

Automatically set `max_num_batched_tokens` #1198

Automatically set `max_num_batched_tokens` #1198