单条样本推理可以不使用stream_infer吗 #1891

zhanghanweii · 2024-07-01T05:54:22Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.

Describe the bug

案例代码都是使用stream_infer推理，但是单条样本推理我发现有decode的代码，输出都是id，请问单条样本的预测代码是什么呢

Reproduction

logits = self.generator.decode(input_ids)

Environment

环境没问题

Error traceback

No response

lvhan028 · 2024-07-01T06:15:26Z

可以使用 pipeline 的接口。
参考文档在这里：https://lmdeploy.readthedocs.io/en/latest/get_started.html#offline-batch-inference

zhanghanweii · 2024-07-01T06:37:28Z

可以使用 pipeline 的接口。参考文档在这里：https://lmdeploy.readthedocs.io/en/latest/get_started.html#offline-batch-inference

谢谢，尝试了成功了，不过我遇到了一个很有趣的问题：
我在输入例如：
1、Read this sentence aloud, this is input: Today is a sunny day.
2、ask this question, this is input: do you know who is jams harden?
时，运行速度非常快，大约在500ms左右，但是我在运行以下输入时，速度就很慢：
1、do you know who is jams harden?
耗时大概是2s，我不知道具体是什么原因，在vllm中也有类似问题，添加this is input: 之后，速度就会变快

输入格式会影响加速效果吗，具体怎么避免呢

lvhan028 · 2024-07-01T06:56:18Z

可以看下生成的token数量是不是变多了

zhanghanweii · 2024-07-01T07:09:04Z

可以看下生成的token数量是不是变多了

都是生成5到6个token，但是耗时甚至比不加速都要慢，vllm也有同样的情况

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

单条样本推理可以不使用stream_infer吗 #1891

单条样本推理可以不使用stream_infer吗 #1891

zhanghanweii commented Jul 1, 2024

lvhan028 commented Jul 1, 2024

zhanghanweii commented Jul 1, 2024

lvhan028 commented Jul 1, 2024

zhanghanweii commented Jul 1, 2024

单条样本推理可以不使用stream_infer吗 #1891

单条样本推理可以不使用stream_infer吗 #1891

Comments

zhanghanweii commented Jul 1, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback

lvhan028 commented Jul 1, 2024

zhanghanweii commented Jul 1, 2024

lvhan028 commented Jul 1, 2024

zhanghanweii commented Jul 1, 2024