-
Notifications
You must be signed in to change notification settings - Fork 286
Insights: InternLM/lmdeploy
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v0.5.0 LMDeploy Release V0.5.0
published
Jul 1, 2024
22 Pull requests merged by 9 people
-
support gemma2 in pytorch engine
#1924 merged
Jul 5, 2024 -
fix: append _stats when size > 0
#1809 merged
Jul 5, 2024 -
misc: add transformers version check for TurboMind Tokenizer
#1917 merged
Jul 5, 2024 -
Support internvl2 chat template
#1911 merged
Jul 5, 2024 -
misc: add default api_server_url for api_client
#1922 merged
Jul 5, 2024 -
vision model use tp number of gpu
#1854 merged
Jul 5, 2024 -
Fix smem size for fused split-kv reduction
#1909 merged
Jul 4, 2024 -
Remove deprecated chat cli and vl examples
#1899 merged
Jul 4, 2024 -
[Doc]: Change to sphinx-book-theme in readthedocs
#1880 merged
Jul 4, 2024 -
Optimize sampling on pytorch engine.
#1853 merged
Jul 3, 2024 -
Support phi3-vision
#1845 merged
Jul 2, 2024 -
Add usage in stream response
#1876 merged
Jul 2, 2024 -
docs: update faq for turbomind so not found
#1877 merged
Jul 2, 2024 -
fix SamplingDecodeTest and SamplingDecodeTest2 unittest failure
#1874 merged
Jul 1, 2024 -
drop stop words
#1823 merged
Jul 1, 2024 -
Fix internlm-xcomposer2-vl awq search scale
#1890 merged
Jul 1, 2024 -
Fix error link reference
#1881 merged
Jul 1, 2024 -
misc: rm unnecessary files
#1875 merged
Jul 1, 2024 -
bump version to v0.5.0
#1852 merged
Jul 1, 2024 -
docs: update cache-max-entry-count help message
#1892 merged
Jul 1, 2024 -
[Doc]: Update docs for internlm2.5
#1887 merged
Jul 1, 2024 -
fix qwen2 cache_position for PyTorch Engine when transformers>4.41.2
#1886 merged
Jul 1, 2024
10 Pull requests opened by 7 people
-
feat: support llama2 and internlm2 on 910B
#1889 opened
Jul 1, 2024 -
Fix index error when profiling token generation with `-ct 1`
#1898 opened
Jul 2, 2024 -
refactor sampling layer setup
#1912 opened
Jul 3, 2024 -
PyTorch Engine AWQ support
#1913 opened
Jul 3, 2024 -
[ci] add internlm2.5 models into testcase
#1928 opened
Jul 5, 2024 -
Upgrade gradio
#1930 opened
Jul 5, 2024 -
Remove deprecated arguments from API and clarify model_name and chat_template_name
#1931 opened
Jul 5, 2024 -
support internlm-xcomposer2d5-7b
#1932 opened
Jul 5, 2024 -
refactor: update awq linear and rm legacy
#1940 opened
Jul 7, 2024 -
fix mixtral cache_position
#1941 opened
Jul 7, 2024
12 Issues closed by 8 people
-
[Bug] assistant always replies ""
#1937 closed
Jul 6, 2024 -
[Feature] support Gemma 2
#1878 closed
Jul 5, 2024 -
[Bug] ValueError: Tokenizer class Qwen2Tokenizer does not exist or is not currently imported.
#1903 closed
Jul 5, 2024 -
[Feature] diff tool for troubleshooting
#1908 closed
Jul 5, 2024 -
[Bug] internvl-chat-v-1-5 predict
#1918 closed
Jul 4, 2024 -
Nightly Build for LMDeploy
#1828 closed
Jul 3, 2024 -
[Bug] lmdeploy - [31mERROR[0m - Truncate max_new_tokens to 221
#1841 closed
Jul 2, 2024 -
[Bug] 量化时候采取默认参数能够正常推理量化,设置了--search-scale True --batch-size 8,量化后无法推理
#1883 closed
Jul 1, 2024 -
[Bug] Mini-InternVL1.5-4B does not suceessfully initialized.
#1721 closed
Jul 1, 2024 -
[Feature] update the range of torch versions
#1857 closed
Jul 1, 2024 -
[Bug] qwen 2 issue when transformers>4.41.2 for PyTorch Engine
#1885 closed
Jul 1, 2024
25 Issues opened by 24 people
-
llmdeploy 使用openai形式提示词请求报错[Bug]
#1939 opened
Jul 6, 2024 -
[Bug] same code A800 good but A10 stuck MiniCPM-Llama3-V-2_5
#1938 opened
Jul 6, 2024 -
[Bug] unified_attention split kv for prefill with more workspace coredump
#1935 opened
Jul 6, 2024 -
[Bug] 想用vscode debug 代码的运行,发现debug到模型运行的时候直接返回结果,无法得知把处理好的输入送入模型得到输出的中间过程
#1933 opened
Jul 5, 2024 -
[Feature] 可以支持embedding模型吗,类似于xinference的功能
#1927 opened
Jul 5, 2024 -
[Bug] Encount TCP error (Port Aready used) when deploy with PytorchEngine
#1925 opened
Jul 5, 2024 -
[Bug] lmdeploy awq量化后不能多卡部署
#1923 opened
Jul 4, 2024 -
[Feature] Is there any plan to support for InternLM-XComposer2.5 inference?
#1920 opened
Jul 4, 2024 -
关于internv2的支持
#1919 opened
Jul 4, 2024 -
能否支持glm-4v-9b模型
#1916 opened
Jul 4, 2024 -
[Bug] AWQ Model Fails Loading ADapter
#1915 opened
Jul 3, 2024 -
是否会支持torch 2.3.0 和 triton 2.3.0
#1914 opened
Jul 3, 2024 -
[Bug] qwen2-0.5b-insturct
#1910 opened
Jul 3, 2024 -
minicpm-v采用W4A16量化,推理速度没什么变化
#1906 opened
Jul 3, 2024 -
[Bug] Segmentation fault occurs and the machine with openEuler os was automatically reboots
#1905 opened
Jul 3, 2024 -
Qwen 2 72b Instruct tp 8 performance degradation
#1904 opened
Jul 3, 2024 -
请问什么时候会支持对CogVLM2的量化
#1902 opened
Jul 3, 2024 -
多轮对话批处理耗时异常
#1901 opened
Jul 3, 2024 -
[Feature] support InternVL-2.0
#1900 opened
Jul 2, 2024 -
[Bug] Using the turbomind engine, prompting more than 10k tokens will result in garbage output.
#1896 opened
Jul 2, 2024 -
[Bug] CUDA runtime error: an illegal memory access was encountered when 8bit kv quant was enabled
#1895 opened
Jul 1, 2024 -
[Bug]
#1894 opened
Jul 1, 2024 -
GenerationConfig 类中的参数n没有发挥作用
#1893 opened
Jul 1, 2024 -
单条样本推理可以不使用stream_infer吗
#1891 opened
Jul 1, 2024
12 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add tools to api_server for InternLM2 model
#1763 commented on
Jul 5, 2024 • 19 new comments -
Support guided decoding for pytorch backend
#1856 commented on
Jul 5, 2024 • 7 new comments -
[Bug] 不支持qwen0.5b的加速?以及qwen0.5b的awq量化?
#1870 commented on
Jul 1, 2024 • 5 new comments -
[Bug] MiniCPM-llama3-V2_5 启动后使用image url 使用base64 没有回复结果
#1819 commented on
Jul 6, 2024 • 3 new comments -
[Feature] blazing great work about KV Cache: Mooncake
#1884 commented on
Jul 1, 2024 • 2 new comments -
[Bug] lmdeploy chat model_name 对话的时候,报Aborted (core dumped)
#1706 commented on
Jul 4, 2024 • 2 new comments -
[Bug] internlm2-chat-1_8b模型使用4bit KV量化的时候找不到key_stats.pth
#1720 commented on
Jul 4, 2024 • 2 new comments -
[Feature] 想问下有打算支持GLM4V模型吗
#1713 commented on
Jul 1, 2024 • 1 new comment -
[Bug] InternLM2MLP.forward() missing 1 required positional argument: 'im_mask'
#1847 commented on
Jul 1, 2024 • 1 new comment -
[Feature] support Nemotron-4 340B
#1784 commented on
Jul 3, 2024 • 1 new comment -
AWQ small batches optimization
#1707 commented on
Jul 3, 2024 • 1 new comment -
feat: decouple input_ids and output_ids
#1855 commented on
Jul 4, 2024 • 0 new comments