-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Using the turbomind engine, prompting more than 10k tokens will result in garbage output. #1896
Comments
Can you share the reproducible code? |
It seems that Zephyr 7B supports 8k context length. |
@lzhangzz We support 16k content length through finetuning, which can work normally in vllm |
@lvhan028 |
Can you share the |
Oh, what's the chat template of the Zephyr 7B like? |
@lvhan028 startup.sh: |
log_level设置为INFO,可以看到拼接后的prompt和token_ids。可以拿来和vllm的对比下,我感觉是没有设置对对话模板 |
@lvhan028 I conducted a comparative test and the prompts of the two are the same. |
Can https://huggingface.co/HuggingFaceH4/zephyr-7b-beta be used to reproduce your issue? |
@lvhan028 Can be reproduced |
The architecture of
|
@lvhan028 I don't think so, I used llama2-13b for testing, and the output was {"role":"assistant","content":"None"}.still garbage output.
Or
|
@lvhan028 Can you help confirm this issue? |
Sorry, @dafu-wu I am busy with a survey. |
@lvhan028 @AllentDan lmdeploy does not support inference of models that extend the context length after finetuning? |
It supports. Since we cannot access your model, even the model config.json file, it's hard for us to do a comparison check with transformers. |
So, |
@lvhan028 yes, this issue is similar to https://github.com/InternLM/lmdeploy/issues/883 |
@lvhan028 Ok, I will do it, what do similarities and differences mean after comparison? |
Checklist
Describe the bug
model: zephyr_7b
engine: turbomind
The shorter prompt test is normal, the test result of more than 10k tokens is garbage output, the stop reason is “length”, and the delay is very large.
Reproduction
As above
Environment
Error traceback
No response
The text was updated successfully, but these errors were encountered: