Pulse · InternLM/lmdeploy · GitHub

July 14, 2024 – July 15, 2024

Overview

5 Active pull requests

17 Active issues

4 Pull requests merged by 3 people

Fix internvl2-40b awq inference
#2023 merged Jul 15, 2024
Avoid the same session id for openai endpoint
#1995 merged Jul 15, 2024
add chat template for codegeex4
#2013 merged Jul 15, 2024
support internlm-xcomposer2d5-7b
#1932 merged Jul 15, 2024

1 Pull request opened by 1 person

bump version to v0.5.1
#2022 opened Jul 15, 2024

10 Issues closed by 6 people

[Bug] 请问下lmdeploy具体支持哪些（类型）的显卡，哪些是明确不支持的呢
#2015 closed Jul 15, 2024
Q: Continuous Batching without Turbomind?
#2025 closed Jul 15, 2024
[Feature] Can you please do INT4 Quantization for InternVL2-26B and InternVL2-40B
#1955 closed Jul 15, 2024
[Bug] InternVL2-40B generates nonsense outputs
#1965 closed Jul 15, 2024
[Bug] AWQ量化InternVL2 40B输出无意义的结果
#2017 closed Jul 15, 2024
关于internv2的支持
#1919 closed Jul 15, 2024
[Bug] KeyError: 'plora_glb_GN' after quantization of internlm/internlm-xcomposer2-4khd-7b to 4-bit
#2014 closed Jul 15, 2024
[Bug] InternVL2-40B量化后部署，无法访问
#2009 closed Jul 15, 2024
Unable to infer on multiple CPUs
#2008 closed Jul 15, 2024
AWQ quantized model produces garbled output during multi-GPU inference
#1996 closed Jul 15, 2024

7 Issues opened by 7 people

auto awq转模型时 im_mask = im_mask.view(-1) AttributeError: 'NoneType' object has no attribute 'view'
#2031 opened Jul 15, 2024
[Bug] This event loop is already running
#2030 opened Jul 15, 2024
[Bug] 微调后的 qwen-vl-chat的模型，使用提供的方法量化成 INT4，显存占用没有下降
#2028 opened Jul 15, 2024
How multimodal batch inference works?
#2027 opened Jul 15, 2024
微调后的glm-4v-9b通过lmdeploy部署会报错
#2026 opened Jul 15, 2024
What is the expected chat template for phi-3-vl?
#2024 opened Jul 15, 2024
[Docs] A100算力加持！书生大模型实战营第3期全面升级，趣味闯关模式等你开启
#2021 opened Jul 15, 2024

8 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Supoort glm4 awq
#1993 commented on Jul 15, 2024 • 4 new comments
lmdeploy serve的并发机制是怎样的
#1997 commented on Jul 15, 2024 • 0 new comments
[Bug] KV Cache INT8 校准警告：Token indices sequence length is longer than the specified maximum sequence length for this model (2874305 > 4096)
#1033 commented on Jul 15, 2024 • 0 new comments
[Bug] same code A800 good but A10 stuck MiniCPM-Llama3-V-2_5
#1938 commented on Jul 15, 2024 • 0 new comments
能否支持glm-4v-9b模型
#1916 commented on Jul 15, 2024 • 0 new comments
[Benchmark] TurboMind benchmark with GLM-4-9B-Chat and Qwen2-72B-Instruct vs vLLM
#1974 commented on Jul 15, 2024 • 0 new comments
torch engine optimize prefill for long context
#1962 commented on Jul 15, 2024 • 0 new comments
Add log info for prefix cache
#2018 commented on Jul 15, 2024 • 0 new comments