-
Notifications
You must be signed in to change notification settings - Fork 299
Insights: InternLM/lmdeploy
Overview
Could not load contribution data
Please try again later
4 Pull requests merged by 3 people
-
Fix internvl2-40b awq inference
#2023 merged
Jul 15, 2024 -
Avoid the same session id for openai endpoint
#1995 merged
Jul 15, 2024 -
add chat template for codegeex4
#2013 merged
Jul 15, 2024 -
support internlm-xcomposer2d5-7b
#1932 merged
Jul 15, 2024
1 Pull request opened by 1 person
-
bump version to v0.5.1
#2022 opened
Jul 15, 2024
10 Issues closed by 6 people
-
[Bug] 请问下lmdeploy具体支持哪些(类型)的显卡,哪些是明确不支持的呢
#2015 closed
Jul 15, 2024 -
Q: Continuous Batching without Turbomind?
#2025 closed
Jul 15, 2024 -
[Feature] Can you please do INT4 Quantization for InternVL2-26B and InternVL2-40B
#1955 closed
Jul 15, 2024 -
[Bug] InternVL2-40B generates nonsense outputs
#1965 closed
Jul 15, 2024 -
[Bug] AWQ量化InternVL2 40B输出无意义的结果
#2017 closed
Jul 15, 2024 -
关于internv2的支持
#1919 closed
Jul 15, 2024 -
[Bug] KeyError: 'plora_glb_GN' after quantization of internlm/internlm-xcomposer2-4khd-7b to 4-bit
#2014 closed
Jul 15, 2024 -
[Bug] InternVL2-40B量化后部署,无法访问
#2009 closed
Jul 15, 2024 -
Unable to infer on multiple CPUs
#2008 closed
Jul 15, 2024 -
AWQ quantized model produces garbled output during multi-GPU inference
#1996 closed
Jul 15, 2024
7 Issues opened by 7 people
-
auto awq转模型时 im_mask = im_mask.view(-1) AttributeError: 'NoneType' object has no attribute 'view'
#2031 opened
Jul 15, 2024 -
[Bug] This event loop is already running
#2030 opened
Jul 15, 2024 -
[Bug] 微调后的 qwen-vl-chat的模型,使用提供的方法量化成 INT4,显存占用没有下降
#2028 opened
Jul 15, 2024 -
How multimodal batch inference works?
#2027 opened
Jul 15, 2024 -
微调后的glm-4v-9b通过lmdeploy部署会报错
#2026 opened
Jul 15, 2024 -
What is the expected chat template for phi-3-vl?
#2024 opened
Jul 15, 2024 -
[Docs] A100算力加持!书生大模型实战营第3期全面升级,趣味闯关模式等你开启
#2021 opened
Jul 15, 2024
8 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Supoort glm4 awq
#1993 commented on
Jul 15, 2024 • 4 new comments -
lmdeploy serve的并发机制是怎样的
#1997 commented on
Jul 15, 2024 • 0 new comments -
[Bug] KV Cache INT8 校准警告:Token indices sequence length is longer than the specified maximum sequence length for this model (2874305 > 4096)
#1033 commented on
Jul 15, 2024 • 0 new comments -
[Bug] same code A800 good but A10 stuck MiniCPM-Llama3-V-2_5
#1938 commented on
Jul 15, 2024 • 0 new comments -
能否支持glm-4v-9b模型
#1916 commented on
Jul 15, 2024 • 0 new comments -
[Benchmark] TurboMind benchmark with GLM-4-9B-Chat and Qwen2-72B-Instruct vs vLLM
#1974 commented on
Jul 15, 2024 • 0 new comments -
torch engine optimize prefill for long context
#1962 commented on
Jul 15, 2024 • 0 new comments -
Add log info for prefix cache
#2018 commented on
Jul 15, 2024 • 0 new comments