[Bug] lmdeploy - [31mERROR[0m - Truncate max_new_tokens to 221 #1841

tairen99 · 2024-06-24T18:51:28Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.

Describe the bug

Hi all,

Thank you for your good work!

As suggested from issue, I tried the latest lmdeploy (lmdeploy-0.4.2+cu121+da439df-cp39-cp39-manylinux2014_x86_64.whl and lmdeploy-0.4.2+cu118+da439df-cp39-cp39-manylinux2014_x86_64.whl to get the deterministic output, but I meet the error as below.

Beside the error, the results are deterministic but for very dense input images, the results are truncated as the ERROR shown.

However, if I install the lmdeploy using "pip install lmdeploy", then, I do not have this error and the results are not truncated even for the dense input images, but the results are NOT deterministic.

========================================

[TM][WARNING] Device 2 peer access Device 3 is not available.
[TM][WARNING] Device 3 peer access Device 0 is not available.
[TM][WARNING] Device 3 peer access Device 1 is not available.
[TM][WARNING] Device 3 peer access Device 2 is not available.
test image is: Meta_2022_13_0_1659551667_stacked_bar_chart_plus_legend.png
2024-06-24 18:30:26,329 - lmdeploy - INFO - start ImageEncoder._forward_loop
2024-06-24 18:30:26,329 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images.
2024-06-24 18:30:26,329 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images.
2024-06-24 18:30:34,239 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 7.910s
2024-06-24 18:30:34,240 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images.
2024-06-24 18:30:34,241 - lmdeploy - INFO - prompt='<|im_start|>system\nYou are an AI assistant whose name is InternLM (书生·浦语).<|im_end|>\n<|im_start|>user\n<IMAGE_TOKEN>\nPlease inference this chart into a detailed table<|im_end|>\n<|im_start|>assistant\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=1024, top_p=0.8, top_k=40, temperature=0, repetition_penalty=1.0, ignore_eos=False, random_seed=6725412376424003715, stop_words=[92542, 92540], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[1, 92543, 9081, 364, 2770, 657, 589, 15358, 17993, 6843, 963, 505, 4576, 11146, 451, 60628, 60384, 60721, 62442, 60752, 699, 92542, 364, 92543, 1008, 364, 92544, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92545, 364, 5658, 43929, 550, 9617, 1263, 395, 11832, 2115, 92542, 364, 92543, 525, 11353, 364], adapter_name=None.
2024-06-24 18:30:34,241 - lmdeploy - INFO - session_id=0, history_tokens=0, input_tokens=1835, max_new_tokens=1024, seq_start=True, seq_end=True, step=0, prep=True
2024-06-24 18:30:34,241 - lmdeploy - ERROR - Truncate max_new_tokens to 221
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 0 received.
[TM][WARNING] [ProcessInferRequests] [0] total sequence length (1835 + 221) exceeds session_len (2056), request_output_len is truncated to 220
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 1835, max_q = 1835, max_k = 1835
[TM][INFO] ------------------------- step = 1840 -------------------------
[TM][INFO] ------------------------- step = 1850 -------------------------
[TM][INFO] ------------------------- step = 1860 -------------------------
[TM][INFO] ------------------------- step = 1870 -------------------------
[TM][INFO] ------------------------- step = 1880 -------------------------
[TM][INFO] ------------------------- step = 1890 -------------------------
[TM][INFO] ------------------------- step = 1900 -------------------------
[TM][INFO] ------------------------- step = 1910 -------------------------
[TM][INFO] ------------------------- step = 1920 -------------------------
[TM][INFO] ------------------------- step = 1930 -------------------------
[TM][INFO] ------------------------- step = 1940 -------------------------
[TM][INFO] ------------------------- step = 1950 -------------------------
[TM][INFO] ------------------------- step = 1960 -------------------------
[TM][INFO] ------------------------- step = 1970 -------------------------
[TM][INFO] ------------------------- step = 1980 -------------------------
[TM][INFO] ------------------------- step = 1990 -------------------------
[TM][INFO] ------------------------- step = 2000 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 0
[TM][INFO] [forward] Request completed for 0
====> The question is: Please inference this chart into a detailed table

========================================

The test input image is:

Reproduction

from lmdeploy import pipeline, GenerationConfig
from lmdeploy.messages import TurbomindEngineConfig
from lmdeploy.vl import load_image

model = 'OpenGVLab/InternVL-Chat-V1-5-AWQ'
image = load_image("/app/342455249-ece4bf69-967a-48cf-812f-c0c9848776a8.jpg")
backend_config = TurbomindEngineConfig(model_format='awq', tp=4, cache_max_entry_count=0.1)
pipe = pipeline(model, backend_config=backend_config, log_level='INFO')
gen_config = GenerationConfig(top_p=0.8,
top_k=40,
temperature=0,
max_new_tokens=1024)
sel_question = "Please inference this chart into a detailed table"
response = pipe((sel_question, image), gen_config=gen_config)
print(response.text)

Environment

Server:  4 NVIDIA Tesla T4 GPUs, each has 16 GB GPU memory
Memory: 191 GB
Number of CPUs: 48
Docker Environment: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
Python version: 3.9.19

Error traceback

No response

The text was updated successfully, but these errors were encountered:

RayTang88 · 2024-06-25T04:00:00Z

I also encountered this problem and hope to get an official answer.
how to control the prompt length , set the session_len, and how to set cache_max_entry_count , quant_policy according to the model parameters, so that the model output is not truncated?

lvhan028 · 2024-06-25T04:14:51Z

This is not a bug.

[TM][WARNING] [ProcessInferRequests] [0] total sequence length (1835 + 221) exceeds session_len (2056), request_output_len is truncated to 220

The default session_len is 2056, meaning the max sequence length of a session, including the input and output tokens.

In your example, the number of input tokens is input_tokens=1835, including the image and prompt tokens.
The requested number of output tokens is max_new_tokens=1024

It indicates that input_tokens + max_new_tokens > session_len, so the engine will truncate the number of requested output tokens.

tairen99 · 2024-06-25T18:00:28Z

This is not a bug.
[TM][WARNING] [ProcessInferRequests] [0] total sequence length (1835 + 221) exceeds session_len (2056), request_output_len is truncated to 220
The default session_len is 2056, meaning the max sequence length of a session, including the input and output tokens.

In your example, the number of input tokens is input_tokens=1835, including the image and prompt tokens. The requested number of output tokens is max_new_tokens=1024

It indicates that input_tokens + max_new_tokens > session_len, so the engine will truncate the number of requested output tokens.

Hi @lvhan028, @zhyncs, and @AllentDan,

Thank you very much for your quick reply and all your help before.

Even though it was not a bug in this case, I do not know why it came across in the wheel versions lmdeploy-0.4.2+cu121+da439df-cp39-cp39-manylinux2014_x86_64.whl and lmdeploy-0.4.2+cu118+da439df-cp39-cp39-manylinux2014_x86_64.whl.

If I using pip install lmdeploy and run the same test code, I get following output without the ERROR information "2024-06-24 18:30:34,241 - lmdeploy - ERROR - Truncate max_new_tokens to 221", see the output for detail from pip install lmdeploy version:

=======================================

[TM][WARNING] Device 3 peer access Device 0 is not available.
[TM][WARNING] Device 3 peer access Device 1 is not available.
[TM][WARNING] Device 3 peer access Device 2 is not available.
test image is: Meta_2022_13_0_1659551667_stacked_bar_chart_plus_legend.png
2024-06-25 17:41:49,486 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images.
2024-06-25 17:41:49,487 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images.
/opt/conda/lib/python3.9/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/opt/conda/lib/python3.9/site-packages/torch/utils/checkpoint.py:90: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
2024-06-25 17:41:52,433 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 2.946s
2024-06-25 17:41:52,433 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images.
2024-06-25 17:41:57,504 - lmdeploy - INFO - prompt='<|im_start|>system\nYou are an AI assistant whose name is InternLM (书生·浦语).<|im_end|>\n<|im_start|>user\n<IMAGE_TOKEN>\nPlease inference this chart into a detailed table<|im_end|>\n<|im_start|>assistant\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=1024, top_p=0.8, top_k=40, temperature=0, repetition_penalty=1.0, ignore_eos=False, random_seed=15886905969490819590, stop_words=[92542, 92540], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[1, 92543, 9081, 364, 2770, 657, 589, 15358, 17993, 6843, 963, 505, 4576, 11146, 451, 60628, 60384, 60721, 62442, 60752, 699, 92542, 364, 92543, 1008, 364, 92544, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92545, 364, 5658, 43929, 550, 9617, 1263, 395, 11832, 2115, 92542, 364, 92543, 525, 11353, 364], adapter_name=None.
2024-06-25 17:41:57,504 - lmdeploy - INFO - session_id=0, history_tokens=0, input_tokens=1835, max_new_tokens=1024, seq_start=True, seq_end=True, step=0, prep=True
[TM][INFO] Set logger level by INFO
[TM][INFO] Set logger level by INFO
[TM][INFO] Set logger level by INFO
[TM][INFO] Set logger level by INFO
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] Set logger level by INFO
[TM][WARNING] [ProcessInferRequests] Request for 0 received.
[TM][INFO] Set logger level by INFO
[TM][INFO] Set logger level by INFO
[TM][INFO] Set logger level by INFO
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 1835, max_q = 1835, max_k = 1835
[TM][INFO] Set logger level by INFO
[TM][INFO] ------------------------- step = 1840 -------------------------
[TM][INFO] ------------------------- step = 1850 -------------------------
[TM][INFO] ------------------------- step = 1860 -------------------------
[TM][INFO] ------------------------- step = 1870 -------------------------
[TM][INFO] ------------------------- step = 1880 -------------------------
[TM][INFO] ------------------------- step = 1890 -------------------------
[TM][INFO] ------------------------- step = 1900 -------------------------
[TM][INFO] ------------------------- step = 1910 -------------------------
[TM][INFO] ------------------------- step = 1920 -------------------------
[TM][INFO] ------------------------- step = 1930 -------------------------
[TM][INFO] ------------------------- step = 1940 -------------------------
[TM][INFO] ------------------------- step = 1950 -------------------------
[TM][INFO] ------------------------- step = 1960 -------------------------
[TM][INFO] ------------------------- step = 1970 -------------------------
[TM][INFO] ------------------------- step = 1980 -------------------------
[TM][INFO] ------------------------- step = 1990 -------------------------
[TM][INFO] ------------------------- step = 2000 -------------------------
[TM][INFO] ------------------------- step = 2010 -------------------------
[TM][INFO] ------------------------- step = 2020 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 0
[TM][INFO] [forward] Request complete for 0, code 0
====> The question is: Please inference this chart into a detailed table

=======================================

So I guess there is some minor difference between your previous version and the new version which may generate this result variation.

Can you please double check the difference? or can you please refer me the changes so I make in my local fork and run workflow for myself?

Thank you again!

AllentDan · 2024-06-26T02:18:12Z

You may ignore the log, it does not influence the usage. We will change that log level from error to warning.

tairen99 · 2024-06-26T04:37:17Z

You may ignore the log, it does not influence the usage. We will change that log level from error to warning.
Hi @AllentDan,

Thank you for your quick response.

Yeah, I wanted to ignore the error directly, but for my large and dense chart, the model's outputs are truncated due to the error mentioned earlier.

However, the lmdeploy version installed via pip install lmdeploy provides a complete output without the truncation issue.

If the same input causes truncation in the wheels version, why does it not cause the same error in the pip-installed version?

Thank you.

AllentDan · 2024-06-26T05:14:11Z

I see. Seems in the current branch, session_len of turbomind was affected. Please specify the session_len arg 32768 in your codes. I will fix it ASAP.

tairen99 · 2024-06-26T06:16:33Z

I see. Seems in the current branch, session_len of turbomind was affected. Please specify the session_len arg 32768 in your codes. I will fix it ASAP.

Sure, thanks a lot.

lvhan028 self-assigned this Jun 25, 2024

AllentDan mentioned this issue Jun 26, 2024

Fix vl session-len #1860

Merged

tairen99 closed this as completed Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] lmdeploy - [31mERROR[0m - Truncate max_new_tokens to 221 #1841

[Bug] lmdeploy - [31mERROR[0m - Truncate max_new_tokens to 221 #1841

tairen99 commented Jun 24, 2024 •

edited

Loading

RayTang88 commented Jun 25, 2024

lvhan028 commented Jun 25, 2024

tairen99 commented Jun 25, 2024

AllentDan commented Jun 26, 2024

tairen99 commented Jun 26, 2024

AllentDan commented Jun 26, 2024

tairen99 commented Jun 26, 2024

[Bug] lmdeploy - [31mERROR[0m - Truncate max_new_tokens to 221 #1841

[Bug] lmdeploy - [31mERROR[0m - Truncate max_new_tokens to 221 #1841

Comments

tairen99 commented Jun 24, 2024 • edited Loading

Checklist

Describe the bug

Reproduction

Environment

Error traceback

RayTang88 commented Jun 25, 2024

lvhan028 commented Jun 25, 2024

tairen99 commented Jun 25, 2024

AllentDan commented Jun 26, 2024

tairen99 commented Jun 26, 2024

AllentDan commented Jun 26, 2024

tairen99 commented Jun 26, 2024

tairen99 commented Jun 24, 2024 •

edited

Loading