Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

单条样本推理可以不使用stream_infer吗 #1891

Open
1 of 2 tasks
zhanghanweii opened this issue Jul 1, 2024 · 4 comments
Open
1 of 2 tasks

单条样本推理可以不使用stream_infer吗 #1891

zhanghanweii opened this issue Jul 1, 2024 · 4 comments

Comments

@zhanghanweii
Copy link

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.

Describe the bug

案例代码都是使用stream_infer推理,但是单条样本推理我发现有decode的代码,输出都是id,请问单条样本的预测代码是什么呢

Reproduction

logits = self.generator.decode(input_ids)

Environment

环境没问题

Error traceback

No response

@lvhan028
Copy link
Collaborator

lvhan028 commented Jul 1, 2024

可以使用 pipeline 的接口。
参考文档在这里:https://lmdeploy.readthedocs.io/en/latest/get_started.html#offline-batch-inference

@zhanghanweii
Copy link
Author

可以使用 pipeline 的接口。 参考文档在这里:https://lmdeploy.readthedocs.io/en/latest/get_started.html#offline-batch-inference

谢谢,尝试了成功了,不过我遇到了一个很有趣的问题:
我在输入例如:
1、Read this sentence aloud, this is input: Today is a sunny day.
2、ask this question, this is input: do you know who is jams harden?
时,运行速度非常快,大约在500ms左右,但是我在运行以下输入时,速度就很慢:
1、do you know who is jams harden?
耗时大概是2s,我不知道具体是什么原因,在vllm中也有类似问题,添加this is input: 之后,速度就会变快

输入格式会影响加速效果吗,具体怎么避免呢

@lvhan028
Copy link
Collaborator

lvhan028 commented Jul 1, 2024

可以看下生成的token数量是不是变多了

@zhanghanweii
Copy link
Author

可以看下生成的token数量是不是变多了

都是生成5到6个token,但是耗时甚至比不加速都要慢,vllm也有同样的情况

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants