Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] add gpu_memory_utilization arg #5079

Merged
merged 2 commits into from
May 29, 2024
Merged

Conversation

pandyamarut
Copy link
Contributor

For Larger models like: meta-llama/Meta-Llama-3-70B It throws

ValueError: The model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (4688). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.

It's required to specify gpu_memory_utilization to successfully run the benchmark.

Signed-off-by: pandyamarut <pandyamarut@gmail.com>
Signed-off-by: pandyamarut <pandyamarut@gmail.com>
@@ -211,5 +212,11 @@ def run_to_completion(profile_dir: Optional[str] = None):
type=str,
default=None,
help='Path to save the latency results in JSON format.')
parser.add_argument('--gpu-memory-utilization',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we want to expose max-model-len as well in the args. We are doing that in the benchmark_througput ( https://sourcegraph.com/github.com/vllm-project/vllm/-/blob/benchmarks/benchmark_throughput.py?L301) and wondering if we should add it here as well since that is another parameter to tune for this error?

@simon-mo simon-mo merged commit 616e600 into vllm-project:main May 29, 2024
63 checks passed
blinkbear pushed a commit to blinkbear/vllm that referenced this pull request May 29, 2024
Signed-off-by: pandyamarut <pandyamarut@gmail.com>
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request May 31, 2024
Signed-off-by: pandyamarut <pandyamarut@gmail.com>
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 8, 2024
Signed-off-by: pandyamarut <pandyamarut@gmail.com>
joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024
Signed-off-by: pandyamarut <pandyamarut@gmail.com>
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jul 14, 2024
Signed-off-by: pandyamarut <pandyamarut@gmail.com>
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Signed-off-by: pandyamarut <pandyamarut@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants