Pulse · InternLM/lmdeploy · GitHub

June 30, 2024 – July 7, 2024

Overview

32 Active pull requests

37 Active issues

1 Release published by 1 person

v0.5.0 LMDeploy Release V0.5.0
published Jul 1, 2024

22 Pull requests merged by 9 people

support gemma2 in pytorch engine
#1924 merged Jul 5, 2024
fix: append _stats when size > 0
#1809 merged Jul 5, 2024
misc: add transformers version check for TurboMind Tokenizer
#1917 merged Jul 5, 2024
Support internvl2 chat template
#1911 merged Jul 5, 2024
misc: add default api_server_url for api_client
#1922 merged Jul 5, 2024
vision model use tp number of gpu
#1854 merged Jul 5, 2024
Fix smem size for fused split-kv reduction
#1909 merged Jul 4, 2024
Remove deprecated chat cli and vl examples
#1899 merged Jul 4, 2024
[Doc]: Change to sphinx-book-theme in readthedocs
#1880 merged Jul 4, 2024
Optimize sampling on pytorch engine.
#1853 merged Jul 3, 2024
Support phi3-vision
#1845 merged Jul 2, 2024
Add usage in stream response
#1876 merged Jul 2, 2024
docs: update faq for turbomind so not found
#1877 merged Jul 2, 2024
fix SamplingDecodeTest and SamplingDecodeTest2 unittest failure
#1874 merged Jul 1, 2024
drop stop words
#1823 merged Jul 1, 2024
Fix internlm-xcomposer2-vl awq search scale
#1890 merged Jul 1, 2024
Fix error link reference
#1881 merged Jul 1, 2024
misc: rm unnecessary files
#1875 merged Jul 1, 2024
bump version to v0.5.0
#1852 merged Jul 1, 2024
docs: update cache-max-entry-count help message
#1892 merged Jul 1, 2024
[Doc]: Update docs for internlm2.5
#1887 merged Jul 1, 2024
fix qwen2 cache_position for PyTorch Engine when transformers>4.41.2
#1886 merged Jul 1, 2024

10 Pull requests opened by 7 people

feat: support llama2 and internlm2 on 910B
#1889 opened Jul 1, 2024
Fix index error when profiling token generation with `-ct 1`
#1898 opened Jul 2, 2024
refactor sampling layer setup
#1912 opened Jul 3, 2024
PyTorch Engine AWQ support
#1913 opened Jul 3, 2024
[ci] add internlm2.5 models into testcase
#1928 opened Jul 5, 2024
Upgrade gradio
#1930 opened Jul 5, 2024
Remove deprecated arguments from API and clarify model_name and chat_template_name
#1931 opened Jul 5, 2024
support internlm-xcomposer2d5-7b
#1932 opened Jul 5, 2024
refactor: update awq linear and rm legacy
#1940 opened Jul 7, 2024
fix mixtral cache_position
#1941 opened Jul 7, 2024

12 Issues closed by 8 people

[Bug] assistant always replies ""
#1937 closed Jul 6, 2024
[Feature] support Gemma 2
#1878 closed Jul 5, 2024
[Bug] while load MGM 8B and 7B, I meet some bug, such as dimension size mismatch and AttributeError: 'NoneType' object has no attribute 'split'
#1929 closed Jul 5, 2024
[Bug] ValueError: Tokenizer class Qwen2Tokenizer does not exist or is not currently imported.
#1903 closed Jul 5, 2024
[Feature] diff tool for troubleshooting
#1908 closed Jul 5, 2024
[Bug] internvl-chat-v-1-5 predict
#1918 closed Jul 4, 2024
Nightly Build for LMDeploy
#1828 closed Jul 3, 2024
[Bug] lmdeploy - [31mERROR[0m - Truncate max_new_tokens to 221
#1841 closed Jul 2, 2024
[Bug] 量化时候采取默认参数能够正常推理量化，设置了--search-scale True --batch-size 8，量化后无法推理
#1883 closed Jul 1, 2024
[Bug] Mini-InternVL1.5-4B does not suceessfully initialized.
#1721 closed Jul 1, 2024
[Feature] update the range of torch versions
#1857 closed Jul 1, 2024
[Bug] qwen 2 issue when transformers>4.41.2 for PyTorch Engine
#1885 closed Jul 1, 2024

25 Issues opened by 24 people

llmdeploy 使用openai形式提示词请求报错[Bug]
#1939 opened Jul 6, 2024
[Bug] same code A800 good but A10 stuck MiniCPM-Llama3-V-2_5
#1938 opened Jul 6, 2024
[Bug] unified_attention split kv for prefill with more workspace coredump
#1935 opened Jul 6, 2024
[Bug] 想用vscode debug 代码的运行，发现debug到模型运行的时候直接返回结果，无法得知把处理好的输入送入模型得到输出的中间过程
#1933 opened Jul 5, 2024
[Feature] 可以支持embedding模型吗，类似于xinference的功能
#1927 opened Jul 5, 2024
[Bug] Encount TCP error (Port Aready used) when deploy with PytorchEngine
#1925 opened Jul 5, 2024
[Bug] lmdeploy awq量化后不能多卡部署
#1923 opened Jul 4, 2024
[Feature] Is there any plan to support for InternLM-XComposer2.5 inference?
#1920 opened Jul 4, 2024
关于internv2的支持
#1919 opened Jul 4, 2024
能否支持glm-4v-9b模型
#1916 opened Jul 4, 2024
[Bug] AWQ Model Fails Loading ADapter
#1915 opened Jul 3, 2024
是否会支持torch 2.3.0 和 triton 2.3.0
#1914 opened Jul 3, 2024
[Bug] qwen2-0.5b-insturct
#1910 opened Jul 3, 2024
[Bug] Failed to load InternVL-Chat-V1-5-Int8 quantized model. RuntimeError: Only Tensors of floating point and complex dtype can require gradients
#1907 opened Jul 3, 2024
minicpm-v采用W4A16量化，推理速度没什么变化
#1906 opened Jul 3, 2024
[Bug] Segmentation fault occurs and the machine with openEuler os was automatically reboots
#1905 opened Jul 3, 2024
Qwen 2 72b Instruct tp 8 performance degradation
#1904 opened Jul 3, 2024
请问什么时候会支持对CogVLM2的量化
#1902 opened Jul 3, 2024
多轮对话批处理耗时异常
#1901 opened Jul 3, 2024
[Feature] support InternVL-2.0
#1900 opened Jul 2, 2024
[Bug] Using the turbomind engine, prompting more than 10k tokens will result in garbage output.
#1896 opened Jul 2, 2024
[Bug] CUDA runtime error: an illegal memory access was encountered when 8bit kv quant was enabled
#1895 opened Jul 1, 2024
[Bug]
#1894 opened Jul 1, 2024
GenerationConfig 类中的参数n没有发挥作用
#1893 opened Jul 1, 2024
单条样本推理可以不使用stream_infer吗
#1891 opened Jul 1, 2024

12 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add tools to api_server for InternLM2 model
#1763 commented on Jul 5, 2024 • 19 new comments
Support guided decoding for pytorch backend
#1856 commented on Jul 5, 2024 • 7 new comments
[Bug] 不支持qwen0.5b的加速？以及qwen0.5b的awq量化？
#1870 commented on Jul 1, 2024 • 5 new comments
[Bug] MiniCPM-llama3-V2_5 启动后使用image url 使用base64 没有回复结果
#1819 commented on Jul 6, 2024 • 3 new comments
[Feature] blazing great work about KV Cache: Mooncake
#1884 commented on Jul 1, 2024 • 2 new comments
[Bug] lmdeploy chat model_name 对话的时候，报Aborted (core dumped)
#1706 commented on Jul 4, 2024 • 2 new comments
[Bug] internlm2-chat-1_8b模型使用4bit KV量化的时候找不到key_stats.pth
#1720 commented on Jul 4, 2024 • 2 new comments
[Feature] 想问下有打算支持GLM4V模型吗
#1713 commented on Jul 1, 2024 • 1 new comment
[Bug] InternLM2MLP.forward() missing 1 required positional argument: 'im_mask'
#1847 commented on Jul 1, 2024 • 1 new comment
[Feature] support Nemotron-4 340B
#1784 commented on Jul 3, 2024 • 1 new comment
AWQ small batches optimization
#1707 commented on Jul 3, 2024 • 1 new comment
feat: decouple input_ids and output_ids
#1855 commented on Jul 4, 2024 • 0 new comments