Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] lmdeploy - [31mERROR[0m - Truncate max_new_tokens to 221 #1841

Closed
1 of 2 tasks
tairen99 opened this issue Jun 24, 2024 · 7 comments
Closed
1 of 2 tasks

[Bug] lmdeploy - [31mERROR[0m - Truncate max_new_tokens to 221 #1841

tairen99 opened this issue Jun 24, 2024 · 7 comments
Assignees

Comments

@tairen99
Copy link

tairen99 commented Jun 24, 2024

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.

Describe the bug

Hi all,

Thank you for your good work!

As suggested from issue, I tried the latest lmdeploy (lmdeploy-0.4.2+cu121+da439df-cp39-cp39-manylinux2014_x86_64.whl and lmdeploy-0.4.2+cu118+da439df-cp39-cp39-manylinux2014_x86_64.whl to get the deterministic output, but I meet the error as below.

Beside the error, the results are deterministic but for very dense input images, the results are truncated as the ERROR shown.

However, if I install the lmdeploy using "pip install lmdeploy", then, I do not have this error and the results are not truncated even for the dense input images, but the results are NOT deterministic.

========================================

[TM][WARNING] Device 2 peer access Device 3 is not available.
[TM][WARNING] Device 3 peer access Device 0 is not available.
[TM][WARNING] Device 3 peer access Device 1 is not available.
[TM][WARNING] Device 3 peer access Device 2 is not available.
test image is: Meta_2022_13_0_1659551667_stacked_bar_chart_plus_legend.png
2024-06-24 18:30:26,329 - lmdeploy - INFO - start ImageEncoder._forward_loop
2024-06-24 18:30:26,329 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images.
2024-06-24 18:30:26,329 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images.
2024-06-24 18:30:34,239 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 7.910s
2024-06-24 18:30:34,240 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images.
2024-06-24 18:30:34,241 - lmdeploy - INFO - prompt='<|im_start|>system\nYou are an AI assistant whose name is InternLM (书生·浦语).<|im_end|>\n<|im_start|>user\n<IMAGE_TOKEN>\nPlease inference this chart into a detailed table<|im_end|>\n<|im_start|>assistant\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=1024, top_p=0.8, top_k=40, temperature=0, repetition_penalty=1.0, ignore_eos=False, random_seed=6725412376424003715, stop_words=[92542, 92540], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[1, 92543, 9081, 364, 2770, 657, 589, 15358, 17993, 6843, 963, 505, 4576, 11146, 451, 60628, 60384, 60721, 62442, 60752, 699, 92542, 364, 92543, 1008, 364, 92544, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92545, 364, 5658, 43929, 550, 9617, 1263, 395, 11832, 2115, 92542, 364, 92543, 525, 11353, 364], adapter_name=None.
2024-06-24 18:30:34,241 - lmdeploy - INFO - session_id=0, history_tokens=0, input_tokens=1835, max_new_tokens=1024, seq_start=True, seq_end=True, step=0, prep=True
2024-06-24 18:30:34,241 - lmdeploy - ERROR - Truncate max_new_tokens to 221
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 0 received.
[TM][WARNING] [ProcessInferRequests] [0] total sequence length (1835 + 221) exceeds session_len (2056), request_output_len is truncated to 220
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 1835, max_q = 1835, max_k = 1835
[TM][INFO] ------------------------- step = 1840 -------------------------
[TM][INFO] ------------------------- step = 1850 -------------------------
[TM][INFO] ------------------------- step = 1860 -------------------------
[TM][INFO] ------------------------- step = 1870 -------------------------
[TM][INFO] ------------------------- step = 1880 -------------------------
[TM][INFO] ------------------------- step = 1890 -------------------------
[TM][INFO] ------------------------- step = 1900 -------------------------
[TM][INFO] ------------------------- step = 1910 -------------------------
[TM][INFO] ------------------------- step = 1920 -------------------------
[TM][INFO] ------------------------- step = 1930 -------------------------
[TM][INFO] ------------------------- step = 1940 -------------------------
[TM][INFO] ------------------------- step = 1950 -------------------------
[TM][INFO] ------------------------- step = 1960 -------------------------
[TM][INFO] ------------------------- step = 1970 -------------------------
[TM][INFO] ------------------------- step = 1980 -------------------------
[TM][INFO] ------------------------- step = 1990 -------------------------
[TM][INFO] ------------------------- step = 2000 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 0
[TM][INFO] [forward] Request completed for 0
====> The question is: Please inference this chart into a detailed table

========================================

The test input image is:
gettyimages-182495865-2048x2048

Reproduction

from lmdeploy import pipeline, GenerationConfig
from lmdeploy.messages import TurbomindEngineConfig
from lmdeploy.vl import load_image

model = 'OpenGVLab/InternVL-Chat-V1-5-AWQ'
image = load_image("/app/342455249-ece4bf69-967a-48cf-812f-c0c9848776a8.jpg")
backend_config = TurbomindEngineConfig(model_format='awq', tp=4, cache_max_entry_count=0.1)
pipe = pipeline(model, backend_config=backend_config, log_level='INFO')
gen_config = GenerationConfig(top_p=0.8,
top_k=40,
temperature=0,
max_new_tokens=1024)
sel_question = "Please inference this chart into a detailed table"
response = pipe((sel_question, image), gen_config=gen_config)
print(response.text)

Environment

Server:  4 NVIDIA Tesla T4 GPUs, each has 16 GB GPU memory
Memory: 191 GB
Number of CPUs: 48
Docker Environment: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
Python version: 3.9.19

Error traceback

No response

@RayTang88
Copy link

I also encountered this problem and hope to get an official answer.
how to control the prompt length , set the session_len, and how to set cache_max_entry_count , quant_policy according to the model parameters, so that the model output is not truncated?

@lvhan028
Copy link
Collaborator

This is not a bug.

[TM][WARNING] [ProcessInferRequests] [0] total sequence length (1835 + 221) exceeds session_len (2056), request_output_len is truncated to 220

The default session_len is 2056, meaning the max sequence length of a session, including the input and output tokens.

In your example, the number of input tokens is input_tokens=1835, including the image and prompt tokens.
The requested number of output tokens is max_new_tokens=1024

It indicates that input_tokens + max_new_tokens > session_len, so the engine will truncate the number of requested output tokens.

@lvhan028 lvhan028 self-assigned this Jun 25, 2024
@tairen99
Copy link
Author

This is not a bug.

[TM][WARNING] [ProcessInferRequests] [0] total sequence length (1835 + 221) exceeds session_len (2056), request_output_len is truncated to 220

The default session_len is 2056, meaning the max sequence length of a session, including the input and output tokens.

In your example, the number of input tokens is input_tokens=1835, including the image and prompt tokens. The requested number of output tokens is max_new_tokens=1024

It indicates that input_tokens + max_new_tokens > session_len, so the engine will truncate the number of requested output tokens.

Hi @lvhan028, @zhyncs, and @AllentDan,

Thank you very much for your quick reply and all your help before.

Even though it was not a bug in this case, I do not know why it came across in the wheel versions lmdeploy-0.4.2+cu121+da439df-cp39-cp39-manylinux2014_x86_64.whl and lmdeploy-0.4.2+cu118+da439df-cp39-cp39-manylinux2014_x86_64.whl.

If I using pip install lmdeploy and run the same test code, I get following output without the ERROR information "2024-06-24 18:30:34,241 - lmdeploy - ERROR - Truncate max_new_tokens to 221", see the output for detail from pip install lmdeploy version:

=======================================

[TM][WARNING] Device 3 peer access Device 0 is not available.
[TM][WARNING] Device 3 peer access Device 1 is not available.
[TM][WARNING] Device 3 peer access Device 2 is not available.
test image is: Meta_2022_13_0_1659551667_stacked_bar_chart_plus_legend.png
2024-06-25 17:41:49,486 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images.
2024-06-25 17:41:49,487 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images.
/opt/conda/lib/python3.9/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/opt/conda/lib/python3.9/site-packages/torch/utils/checkpoint.py:90: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
2024-06-25 17:41:52,433 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 2.946s
2024-06-25 17:41:52,433 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images.
2024-06-25 17:41:57,504 - lmdeploy - INFO - prompt='<|im_start|>system\nYou are an AI assistant whose name is InternLM (书生·浦语).<|im_end|>\n<|im_start|>user\n<IMAGE_TOKEN>\nPlease inference this chart into a detailed table<|im_end|>\n<|im_start|>assistant\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=1024, top_p=0.8, top_k=40, temperature=0, repetition_penalty=1.0, ignore_eos=False, random_seed=15886905969490819590, stop_words=[92542, 92540], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[1, 92543, 9081, 364, 2770, 657, 589, 15358, 17993, 6843, 963, 505, 4576, 11146, 451, 60628, 60384, 60721, 62442, 60752, 699, 92542, 364, 92543, 1008, 364, 92544, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92545, 364, 5658, 43929, 550, 9617, 1263, 395, 11832, 2115, 92542, 364, 92543, 525, 11353, 364], adapter_name=None.
2024-06-25 17:41:57,504 - lmdeploy - INFO - session_id=0, history_tokens=0, input_tokens=1835, max_new_tokens=1024, seq_start=True, seq_end=True, step=0, prep=True
[TM][INFO] Set logger level by INFO
[TM][INFO] Set logger level by INFO
[TM][INFO] Set logger level by INFO
[TM][INFO] Set logger level by INFO
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] Set logger level by INFO
[TM][WARNING] [ProcessInferRequests] Request for 0 received.
[TM][INFO] Set logger level by INFO
[TM][INFO] Set logger level by INFO
[TM][INFO] Set logger level by INFO
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 1835, max_q = 1835, max_k = 1835
[TM][INFO] Set logger level by INFO
[TM][INFO] ------------------------- step = 1840 -------------------------
[TM][INFO] ------------------------- step = 1850 -------------------------
[TM][INFO] ------------------------- step = 1860 -------------------------
[TM][INFO] ------------------------- step = 1870 -------------------------
[TM][INFO] ------------------------- step = 1880 -------------------------
[TM][INFO] ------------------------- step = 1890 -------------------------
[TM][INFO] ------------------------- step = 1900 -------------------------
[TM][INFO] ------------------------- step = 1910 -------------------------
[TM][INFO] ------------------------- step = 1920 -------------------------
[TM][INFO] ------------------------- step = 1930 -------------------------
[TM][INFO] ------------------------- step = 1940 -------------------------
[TM][INFO] ------------------------- step = 1950 -------------------------
[TM][INFO] ------------------------- step = 1960 -------------------------
[TM][INFO] ------------------------- step = 1970 -------------------------
[TM][INFO] ------------------------- step = 1980 -------------------------
[TM][INFO] ------------------------- step = 1990 -------------------------
[TM][INFO] ------------------------- step = 2000 -------------------------
[TM][INFO] ------------------------- step = 2010 -------------------------
[TM][INFO] ------------------------- step = 2020 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 0
[TM][INFO] [forward] Request complete for 0, code 0
====> The question is: Please inference this chart into a detailed table

=======================================

So I guess there is some minor difference between your previous version and the new version which may generate this result variation.

Can you please double check the difference? or can you please refer me the changes so I make in my local fork and run workflow for myself?

Thank you again!

@AllentDan
Copy link
Collaborator

You may ignore the log, it does not influence the usage. We will change that log level from error to warning.

@tairen99
Copy link
Author

You may ignore the log, it does not influence the usage. We will change that log level from error to warning.
Hi @AllentDan,

Thank you for your quick response.

Yeah, I wanted to ignore the error directly, but for my large and dense chart, the model's outputs are truncated due to the error mentioned earlier.

However, the lmdeploy version installed via pip install lmdeploy provides a complete output without the truncation issue.

If the same input causes truncation in the wheels version, why does it not cause the same error in the pip-installed version?

Thank you.

@AllentDan
Copy link
Collaborator

I see. Seems in the current branch, session_len of turbomind was affected. Please specify the session_len arg 32768 in your codes. I will fix it ASAP.

@tairen99
Copy link
Author

I see. Seems in the current branch, session_len of turbomind was affected. Please specify the session_len arg 32768 in your codes. I will fix it ASAP.

Sure, thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants