-
Notifications
You must be signed in to change notification settings - Fork 299
Insights: InternLM/lmdeploy
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v0.5.0 LMDeploy Release V0.5.0
published
Jul 1, 2024
89 Pull requests merged by 12 people
-
Fix internvl2-40b awq inference
#2023 merged
Jul 15, 2024 -
Avoid the same session id for openai endpoint
#1995 merged
Jul 15, 2024 -
add chat template for codegeex4
#2013 merged
Jul 15, 2024 -
support internlm-xcomposer2d5-7b
#1932 merged
Jul 15, 2024 -
Add exception handler to imge encoder
#2010 merged
Jul 13, 2024 -
docs: fix Ada compatibility
#2016 merged
Jul 13, 2024 -
Support glm 4v
#1947 merged
Jul 12, 2024 -
feat: support llama2 and internlm2 on 910B
#2011 merged
Jul 12, 2024 -
Fix logprobs openai api
#1985 merged
Jul 12, 2024 -
Fix the session_len assignment logic
#2007 merged
Jul 12, 2024 -
Fix table rendering for readthedocs
#1998 merged
Jul 12, 2024 -
docs: sync the core features in README to index.rst
#1988 merged
Jul 11, 2024 -
fix mixtral and mistral cache_position
#1941 merged
Jul 11, 2024 -
support internvl2-1b
#1983 merged
Jul 11, 2024 -
fix unexpected argument error when deploying "cogvlm-chat-hf"
#1982 merged
Jul 10, 2024 -
fix logprobs
#1968 merged
Jul 10, 2024 -
refactor sampling layer setup
#1912 merged
Jul 10, 2024 -
Fix internvl2-40b model export
#1979 merged
Jul 10, 2024 -
docs: update kv quant doc
#1977 merged
Jul 10, 2024 -
feat: support llama2 and internlm2 on 910B
#1889 merged
Jul 9, 2024 -
fix: set PYTHONIOENCODING to UTF-8 before start tritonserver
#1971 merged
Jul 9, 2024 -
[ci] add internlm2.5 models into testcase
#1928 merged
Jul 9, 2024 -
Upgrade gradio
#1930 merged
Jul 9, 2024 -
Add tools to api_server for InternLM2 model
#1763 merged
Jul 9, 2024 -
fix transformers version check for InternVL2
#1952 merged
Jul 9, 2024 -
fix llama3 chat template
#1956 merged
Jul 9, 2024 -
feat: add gpu topo for check_env
#1944 merged
Jul 8, 2024 -
refactor: update awq linear and rm legacy
#1940 merged
Jul 8, 2024 -
docs: update compatibility section in README
#1946 merged
Jul 8, 2024 -
support gemma2 in pytorch engine
#1924 merged
Jul 5, 2024 -
fix: append _stats when size > 0
#1809 merged
Jul 5, 2024 -
misc: add transformers version check for TurboMind Tokenizer
#1917 merged
Jul 5, 2024 -
Support internvl2 chat template
#1911 merged
Jul 5, 2024 -
misc: add default api_server_url for api_client
#1922 merged
Jul 5, 2024 -
vision model use tp number of gpu
#1854 merged
Jul 5, 2024 -
Fix smem size for fused split-kv reduction
#1909 merged
Jul 4, 2024 -
Remove deprecated chat cli and vl examples
#1899 merged
Jul 4, 2024 -
[Doc]: Change to sphinx-book-theme in readthedocs
#1880 merged
Jul 4, 2024 -
Optimize sampling on pytorch engine.
#1853 merged
Jul 3, 2024 -
Support phi3-vision
#1845 merged
Jul 2, 2024 -
Add usage in stream response
#1876 merged
Jul 2, 2024 -
docs: update faq for turbomind so not found
#1877 merged
Jul 2, 2024 -
fix SamplingDecodeTest and SamplingDecodeTest2 unittest failure
#1874 merged
Jul 1, 2024 -
drop stop words
#1823 merged
Jul 1, 2024 -
Fix internlm-xcomposer2-vl awq search scale
#1890 merged
Jul 1, 2024 -
Fix error link reference
#1881 merged
Jul 1, 2024 -
misc: rm unnecessary files
#1875 merged
Jul 1, 2024 -
bump version to v0.5.0
#1852 merged
Jul 1, 2024 -
docs: update cache-max-entry-count help message
#1892 merged
Jul 1, 2024 -
[Doc]: Update docs for internlm2.5
#1887 merged
Jul 1, 2024 -
fix qwen2 cache_position for PyTorch Engine when transformers>4.41.2
#1886 merged
Jul 1, 2024 -
fix gradio vl "stop_words"
#1873 merged
Jun 27, 2024 -
fix model name matching for internvl
#1867 merged
Jun 27, 2024 -
Fix vl session-len
#1860 merged
Jun 26, 2024 -
[side-effect] bring back "--cap" argument in chat cli
#1859 merged
Jun 26, 2024 -
react test evaluation config
#1861 merged
Jun 26, 2024 -
misc: align PyTorch Engine temprature with TurboMind
#1850 merged
Jun 26, 2024 -
remove chat template config in turbomind engine
#1161 merged
Jun 25, 2024 -
Add interfaces to the pipeline to obtain logits and ppl
#1652 merged
Jun 25, 2024 -
Support Qwen2-1.5b awq
#1793 merged
Jun 24, 2024 -
Harden stream callback
#1838 merged
Jun 24, 2024 -
fix image encoder request queue
#1837 merged
Jun 24, 2024 -
Support internvl-chat for pytorch engine
#1797 merged
Jun 24, 2024 -
Add model revision & download_dir to cli
#1814 merged
Jun 24, 2024 -
compat internlm2 for pytorch engine
#1825 merged
Jun 24, 2024 -
Torch deepseek v2
#1621 merged
Jun 24, 2024 -
Update engine.py to fix small typos
#1829 merged
Jun 24, 2024 -
Detokenize with prompt token ids
#1753 merged
Jun 22, 2024 -
fix qwen-vl-chat hung
#1824 merged
Jun 21, 2024 -
AsyncEngine create cancel task in exception.
#1807 merged
Jun 21, 2024 -
Fix Request completed log
#1821 merged
Jun 21, 2024 -
Add GLM-4-9B-Chat
#1724 merged
Jun 21, 2024 -
PyTorchEngine adapts to the latest internlm2 modeling.
#1798 merged
Jun 21, 2024 -
Device dispatcher
#1775 merged
Jun 21, 2024 -
fix best_match_model
#1812 merged
Jun 20, 2024 -
check driver mismatch
#1811 merged
Jun 20, 2024 -
fix pr test for newest internlm2 model
#1806 merged
Jun 20, 2024 -
feat: auto set awq model_format from hf
#1799 merged
Jun 19, 2024 -
Optimize kernel launch for triton2.2.0 and triton2.3.0
#1499 merged
Jun 19, 2024 -
[Feature]: Support llava for pytorch engine
#1641 merged
Jun 19, 2024 -
More accurate time logging for ImageEncoder and fix concurrent image processing corruption
#1765 merged
Jun 18, 2024 -
[side-effect] fix weight_type caused by PR #1702
#1795 merged
Jun 18, 2024 -
fix: prevent numpy breakage
#1791 merged
Jun 18, 2024 -
Refine AsyncEngine exception handler
#1789 merged
Jun 18, 2024 -
skip inference for oversized inputs
#1769 merged
Jun 18, 2024 -
Encode raw image file to base64
#1773 merged
Jun 17, 2024 -
fix falcon attention
#1761 merged
Jun 17, 2024 -
support qwen2 1.5b
#1782 merged
Jun 17, 2024 -
Add anomaly handler
#1780 merged
Jun 17, 2024
14 Pull requests opened by 7 people
-
Add Jetson platform support (by docker)
#1820 opened
Jun 21, 2024 -
Maybe a workaround for qwen2 quantization Nan error
#1844 opened
Jun 25, 2024 -
feat: decouple input_ids and output_ids
#1855 opened
Jun 25, 2024 -
Support guided decoding for pytorch backend
#1856 opened
Jun 26, 2024 -
Fix index error when profiling token generation with `-ct 1`
#1898 opened
Jul 2, 2024 -
PyTorch Engine AWQ support
#1913 opened
Jul 3, 2024 -
Remove deprecated arguments from API and clarify model_name and chat_template_name
#1931 opened
Jul 5, 2024 -
torch engine optimize prefill for long context
#1962 opened
Jul 9, 2024 -
support min_p sampling & do_sample setting
#1966 opened
Jul 9, 2024 -
Phi3 awq
#1984 opened
Jul 10, 2024 -
Remove the triton inference server backend "turbomind_backend"
#1986 opened
Jul 10, 2024 -
Supoort glm4 awq
#1993 opened
Jul 11, 2024 -
Add log info for prefix cache
#2018 opened
Jul 13, 2024 -
bump version to v0.5.1
#2022 opened
Jul 15, 2024
91 Issues closed by 39 people
-
[Bug] 请问下lmdeploy具体支持哪些(类型)的显卡,哪些是明确不支持的呢
#2015 closed
Jul 15, 2024 -
Q: Continuous Batching without Turbomind?
#2025 closed
Jul 15, 2024 -
[Feature] Can you please do INT4 Quantization for InternVL2-26B and InternVL2-40B
#1955 closed
Jul 15, 2024 -
[Bug] InternVL2-40B generates nonsense outputs
#1965 closed
Jul 15, 2024 -
[Bug] AWQ量化InternVL2 40B输出无意义的结果
#2017 closed
Jul 15, 2024 -
关于internv2的支持
#1919 closed
Jul 15, 2024 -
[Bug] KeyError: 'plora_glb_GN' after quantization of internlm/internlm-xcomposer2-4khd-7b to 4-bit
#2014 closed
Jul 15, 2024 -
[Bug] InternVL2-40B量化后部署,无法访问
#2009 closed
Jul 15, 2024 -
Unable to infer on multiple CPUs
#2008 closed
Jul 15, 2024 -
AWQ quantized model produces garbled output during multi-GPU inference
#1996 closed
Jul 15, 2024 -
Quantization of internlm/internlm-xcomposer2-4khd-7b to 4-bit?
#2012 closed
Jul 12, 2024 -
[Bug] 部署经过微调的internvl-chat-v1_5模型导致无法停止输出。
#2000 closed
Jul 12, 2024 -
Qwen 2 72b Instruct tp 8 performance degradation
#1904 closed
Jul 12, 2024 -
about LMDeploy delivers up to 1.8x higher request throughput than vLLM
#2005 closed
Jul 12, 2024 -
[Bug] Turbomind Docker getting failed after High load
#1954 closed
Jul 11, 2024 -
[Bug] Segmentation fault occurs and the machine with openEuler os was automatically reboots
#1905 closed
Jul 11, 2024 -
Internvl2 api 使用没法正常返回结果,用transforms的推理方式可以
#1959 closed
Jul 11, 2024 -
[Bug] 使用lmdeploy serve开启internvl-v1-5后一定输出到最长长度
#1958 closed
Jul 11, 2024 -
[Bug] The official documentation does not automatically update
#1975 closed
Jul 11, 2024 -
[Feature] 为internVL2添加function calling能力(Tools能力)
#1987 closed
Jul 11, 2024 -
[Feature] support function calling
#1800 closed
Jul 10, 2024 -
ScaleLLM inspiration
#1510 closed
Jul 10, 2024 -
[Bug] KeyError: 'Qwen2ForCausalLM' for InternVL2 1B
#1963 closed
Jul 10, 2024 -
[Bug] Encount TCP error (Port Aready used) when deploy with PytorchEngine
#1925 closed
Jul 10, 2024 -
[Bug] internlm2-chat-1_8b模型使用4bit KV量化的时候找不到key_stats.pth
#1720 closed
Jul 10, 2024 -
[Feature] support InternVL-2.0
#1900 closed
Jul 9, 2024 -
是否会支持torch 2.3.0 和 triton 2.3.0
#1914 closed
Jul 9, 2024 -
[Bug] Llama3 Chat Template are not consistency with the Huggingface implementation.
#1945 closed
Jul 9, 2024 -
AttributeError: 'AsyncEngine' object has no attribute 'get_ppl'
#1950 closed
Jul 9, 2024 -
为什么离线转换lmdeploy convert不支持internlm2.5和Qwen2
#1960 closed
Jul 9, 2024 -
[Bug] 最新更新的lmdeploy_0.5.0在批量推理时候没有输出logprob
#1973 closed
Jul 9, 2024 -
[Feature] 是否能支持MiniCPMv2.5
#1969 closed
Jul 9, 2024 -
llmdeploy 使用openai形式提示词请求报错[Bug]
#1939 closed
Jul 9, 2024 -
[Feature] Is there any plan to support internvl2 inference?
#1953 closed
Jul 9, 2024 -
[Bug] assistant always replies ""
#1937 closed
Jul 6, 2024 -
[Bug] assistant always replies ""
#1934 closed
Jul 6, 2024 -
[Bug] assistant always replies ""
#1936 closed
Jul 6, 2024 -
[Feature] support Gemma 2
#1878 closed
Jul 5, 2024 -
[Bug] ValueError: Tokenizer class Qwen2Tokenizer does not exist or is not currently imported.
#1903 closed
Jul 5, 2024 -
[Feature] diff tool for troubleshooting
#1908 closed
Jul 5, 2024 -
[Bug] internvl-chat-v-1-5 predict
#1918 closed
Jul 4, 2024 -
Nightly Build for LMDeploy
#1828 closed
Jul 3, 2024 -
[Bug] lmdeploy - [31mERROR[0m - Truncate max_new_tokens to 221
#1841 closed
Jul 2, 2024 -
[Bug] 量化时候采取默认参数能够正常推理量化,设置了--search-scale True --batch-size 8,量化后无法推理
#1883 closed
Jul 1, 2024 -
[Bug] Mini-InternVL1.5-4B does not suceessfully initialized.
#1721 closed
Jul 1, 2024 -
[Feature] update the range of torch versions
#1857 closed
Jul 1, 2024 -
[Bug] qwen 2 issue when transformers>4.41.2 for PyTorch Engine
#1885 closed
Jul 1, 2024 -
need gemma2 support
#1888 closed
Jul 1, 2024 -
[Bug] xcomposer 4khd lora weight error in lmdeploy
#1747 closed
Jun 30, 2024 -
[Feature] Function call
#1882 closed
Jun 28, 2024 -
[Bug] InternVL 1.5性能瓶颈在ViT,有计划支持ViT TM backend+TP推理不?
#1869 closed
Jun 28, 2024 -
[Bug] hang when many requests
#1619 closed
Jun 27, 2024 -
How to quantify deepseek-ai/deepseek-vl-7b-chat
#1865 closed
Jun 27, 2024 -
[Feature] 多模态的模型支持在线serving吗?
#1762 closed
Jun 27, 2024 -
under stream mode, if break generator in advance, it may lead to server stuck [Bug]
#1848 closed
Jun 26, 2024 -
[Bug] pytorch方式多卡部署internlm-xcomposer2-vl-7b,报错KeyError: 'parameter name can\'t contain "."'
#1834 closed
Jun 26, 2024 -
换用 LLM 基座的 LLaVA 模型适配
#1655 closed
Jun 25, 2024 -
[Bug] 多图推理效果不佳
#1843 closed
Jun 25, 2024 -
[Feature] Please add support for Qwen2
#1805 closed
Jun 25, 2024 -
[Feature] 使用已经构建好的input使用lmdeploy来进行推理
#1760 closed
Jun 25, 2024 -
about getting the deterministic answer from VLM model, such as InternVL-Chat-V1-5-AWQ
#1783 closed
Jun 24, 2024 -
[Bug] smooth_quant量化后的模型重新运行,lmdeploy无法正常推理
#1822 closed
Jun 24, 2024 -
[Feature] Support DeepSeek-V2 Model
#1556 closed
Jun 24, 2024 -
[Bug] Space is incorrectly removed from start of generated text for `/v1/completion` endpoint
#1743 closed
Jun 23, 2024 -
[Bug] Task was destroyed but it is pending! ImageEncoder._forward_loop()
#1818 closed
Jun 22, 2024 -
int8 kv cache 和 Flash Attention 无法一起使用
#1816 closed
Jun 20, 2024 -
[Feature] lmdeploy chat <model_path> --chat-template {json}
#1519 closed
Jun 20, 2024 -
[Feature] Implement COG-VLM2
#1622 closed
Jun 20, 2024 -
[Bug] 部署cogvlm2运行时,接受的多个并发之间存在干扰,后面的请求使用前面请求传的图像
#1730 closed
Jun 20, 2024 -
[Bug] Key Error loading OpenGVLab/Mini-InternVL-Chat-4B-V1-5
#1756 closed
Jun 20, 2024 -
"Aborted (core dumped)" when running Qwen2-7B-Instruct [Bug]
#1792 closed
Jun 20, 2024 -
[Feature] qwen2系列模型
#1777 closed
Jun 20, 2024 -
[Bug] 判断条件检查
#1757 closed
Jun 20, 2024 -
[Feature] Layer Wise Calibration and Quantization of Models (To quantize model on Low VRAM GPU)
#1625 closed
Jun 20, 2024 -
[Bug] 关于流式并发相关
#1557 closed
Jun 20, 2024 -
logger in `lmdeploy/serve/async_engine.py` is hard coded
#1503 closed
Jun 20, 2024 -
[Feature] 多模态模型量化示例
#1483 closed
Jun 20, 2024 -
batch inference
#1689 closed
Jun 20, 2024 -
[Bug] ModuleNotFoundError: No module named '_turbomind' loading llava Mistral 7B
#1699 closed
Jun 20, 2024 -
[Bug] lmdeploy got nccl error
#1803 closed
Jun 19, 2024 -
[Feature] lmdeploy通过命令行可以启动一个gradio应用,这个gradio的应用是不是可以给用户提供UI修改的方法?
#1710 closed
Jun 17, 2024 -
能支持下mini_internvl_2b_1.5模型的部署么?
#1774 closed
Jun 16, 2024
82 Issues opened by 64 people
-
[Bug] 微调后的 qwen-vl-chat的模型,使用提供的方法量化成 INT4,显存占用没有下降
#2028 opened
Jul 15, 2024 -
How multimodal batch inference works?
#2027 opened
Jul 15, 2024 -
微调后的glm-4v-9b通过lmdeploy部署会报错
#2026 opened
Jul 15, 2024 -
What is the expected chat template for phi-3-vl?
#2024 opened
Jul 15, 2024 -
[Docs] A100算力加持!书生大模型实战营第3期全面升级,趣味闯关模式等你开启
#2021 opened
Jul 15, 2024 -
[Bug] lmdeploy has two questions about lora
#2020 opened
Jul 14, 2024 -
[Benchmark] PyTorch Engine Mixtral 8x7B performance issue
#2019 opened
Jul 13, 2024 -
使用lmdeploy部署后模型返回空
#2006 opened
Jul 12, 2024 -
[Bug] 两张 V100 部署 InternVL2-26B,多模态对话时无应答
#2004 opened
Jul 12, 2024 -
[Feature] Flash Attention 3
#2003 opened
Jul 11, 2024 -
[Bug] 访问一段时间后服务卡死/无响应
#2001 opened
Jul 11, 2024 -
[Feature] 请问是否可以支持多模态模型donut的推理
#1999 opened
Jul 11, 2024 -
lmdeploy serve的并发机制是怎样的
#1997 opened
Jul 11, 2024 -
[Bug] 使用lmdeploy方法部署InternVL-Chat-V1-5-AWQ,0.4.2版本openai客户端可以正常访问,0.5.0版本会卡死,没有任何响应,新增的cogvlm2也有类似现象。
#1992 opened
Jul 11, 2024 -
用lmdeploy部署internlm2_5-7B-chat请求返回为空
#1991 opened
Jul 11, 2024 -
Could not use my local internVL mini model for inference
#1990 opened
Jul 10, 2024 -
[Feature] 我们支持gptq量化模型的推理么
#1989 opened
Jul 10, 2024 -
[Bug] MiniCPMV的推理有问题
#1981 opened
Jul 10, 2024 -
[Feature] Any plan for support Minfernece?
#1980 opened
Jul 10, 2024 -
glm4-9b如何量化运行?
#1976 opened
Jul 9, 2024 -
[Benchmark] TurboMind benchmark with GLM-4-9B-Chat and Qwen2-72B-Instruct vs vLLM
#1974 opened
Jul 9, 2024 -
[Feature] tubromind有计划支持cogvlm2吗?
#1970 opened
Jul 9, 2024 -
[Feature] Support for CogVLM2-Video-LLama3-Chat in TorchEngine
#1964 opened
Jul 9, 2024 -
internvl 多图文使用openai 接口形式如何传数据[Bug]
#1961 opened
Jul 9, 2024 -
[Bug] response里需要生成's 但是只显示' s都不输出。
#1951 opened
Jul 8, 2024 -
多模态批推理如何实现?
#1949 opened
Jul 8, 2024 -
[Bug] 为什么logprobs的内容是None?Why the value of logprobs is None?
#1948 opened
Jul 8, 2024 -
[Feature] Prefix cache hit/miss/eviction statistics to detect cache thrashing
#1942 opened
Jul 7, 2024 -
[Bug] same code A800 good but A10 stuck MiniCPM-Llama3-V-2_5
#1938 opened
Jul 6, 2024 -
[Bug] unified_attention split kv for prefill with more workspace coredump
#1935 opened
Jul 6, 2024 -
logits的获取
#1933 opened
Jul 5, 2024 -
[Feature] 可以支持embedding模型吗,类似于xinference的功能
#1927 opened
Jul 5, 2024 -
[Bug] lmdeploy awq量化后不能多卡部署
#1923 opened
Jul 4, 2024 -
[Feature] Is there any plan to support for InternLM-XComposer2.5 inference?
#1920 opened
Jul 4, 2024 -
能否支持glm-4v-9b模型
#1916 opened
Jul 4, 2024 -
[Bug] AWQ Model Fails Loading ADapter
#1915 opened
Jul 3, 2024 -
[Bug] qwen2-0.5b-insturct
#1910 opened
Jul 3, 2024 -
minicpm-v采用W4A16量化,推理速度没什么变化
#1906 opened
Jul 3, 2024 -
请问什么时候会支持对CogVLM2的量化
#1902 opened
Jul 3, 2024 -
多轮对话批处理耗时异常
#1901 opened
Jul 3, 2024 -
[Bug] Using the turbomind engine, prompting more than 10k tokens will result in garbage output.
#1896 opened
Jul 2, 2024 -
[Bug] CUDA runtime error: an illegal memory access was encountered when 8bit kv quant was enabled
#1895 opened
Jul 1, 2024 -
[Bug]
#1894 opened
Jul 1, 2024 -
GenerationConfig 类中的参数n没有发挥作用
#1893 opened
Jul 1, 2024 -
单条样本推理可以不使用stream_infer吗
#1891 opened
Jul 1, 2024 -
[Feature] blazing great work about KV Cache: Mooncake
#1884 opened
Jun 28, 2024 -
[Feature] long context inference optimization
#1879 opened
Jun 27, 2024 -
[Docs] TurboMind推理引擎与PyTorch推理引擎速度对比
#1872 opened
Jun 27, 2024 -
[Bug] 不支持qwen0.5b的加速?以及qwen0.5b的awq量化?
#1870 opened
Jun 27, 2024 -
[Bug] AttributeError: 'LlavaNextConfig' object has no attribute 'hidden_size'
#1868 opened
Jun 27, 2024 -
[Bug] internvl 模型被推理后,针对图片内容回答的答案不正确
#1866 opened
Jun 27, 2024 -
使用pipeline加载Qwen1.5-32B-Chat,tp=4,使用openai prompt格式提示其清洗中文但生成回复都是英文
#1864 opened
Jun 26, 2024 -
使用OpenAI format的输入得到的response要如何提取出回复文本,返回的response好像是分段的
#1863 opened
Jun 26, 2024 -
[Bug] 单轮的图文交错对话的实现原理
#1862 opened
Jun 26, 2024 -
[Bug] Segmentation fault: address not mapped to object at address 0x2058
#1849 opened
Jun 25, 2024 -
[Bug] InternLM2MLP.forward() missing 1 required positional argument: 'im_mask'
#1847 opened
Jun 25, 2024 -
如何指定模型的数据类型为f16
#1846 opened
Jun 25, 2024 -
[Docs] 多模态模型的api_server应该如何多卡部署?
#1840 opened
Jun 24, 2024 -
[Feature] How to support bf16 when inferencing Internvl-chat
#1839 opened
Jun 24, 2024 -
[Bug] qwen2 awq量化微调后的模型报错
#1836 opened
Jun 24, 2024 -
使用TurboMind 推理 + Python 代码集成的方式报错
#1835 opened
Jun 24, 2024 -
[Bug] smoothquant量化Bacihuan2-7B-Chat模型,无法正常量化
#1831 opened
Jun 23, 2024 -
[Bug] Qwen-7B-Chat 量化报错 AttributeError: 'RMSNorm' object has no attribute 'variance_epsilon'
#1830 opened
Jun 23, 2024 -
Model name id returned is weird specially when using Docker [Bug]
#1827 opened
Jun 21, 2024 -
[Bug] awq for Qwen2-72B-instruct
#1826 opened
Jun 21, 2024 -
[Bug] MiniCPM-llama3-V2_5 启动后使用image url 使用base64 没有回复结果
#1819 opened
Jun 21, 2024 -
[Feature] Option to also use host memory for the KV cache
#1817 opened
Jun 21, 2024 -
[Bug] lmdeploy部署intermlm2-chat-20b,遇到<|im_end|>不会停止
#1815 opened
Jun 20, 2024 -
[Bug] vl pipeline triggle cudaMemcpyAsync ERROR illegal memory access
#1813 opened
Jun 20, 2024 -
[Bug] 使用领域数据sft qwen2-7b后,转awq 报错
#1810 opened
Jun 20, 2024 -
支持glm-4-9b吗
#1808 opened
Jun 19, 2024 -
[Bug] No way to you specify a model revision?
#1804 opened
Jun 19, 2024 -
[Bug] n_token = outputs.num_token . Error: AttributeError: 'tuple' object has no attribute 'num_token'
#1802 opened
Jun 19, 2024 -
[Feature] Prefill/Decoding disaggregation substantially boosts throughput
#1801 opened
Jun 19, 2024 -
[Bug] 对Llama-3-70B-Instruct进行量化的时候会出现OOM的问题
#1796 opened
Jun 18, 2024 -
[Bug] KeyError: 'Phi3ForCausalLM'
#1794 opened
Jun 18, 2024 -
[Feature] 多模态api_server推理速度性能测试
#1790 opened
Jun 17, 2024 -
是否兼容openai中参数n的设置?尝试设置n>1,但仍然只返回一条结果
#1787 opened
Jun 16, 2024 -
[Bug] Qwen/Qwen2-72B-Instruct AWQ Quantization NaN Error
#1786 opened
Jun 16, 2024 -
[Docs] 吞吐的提升主要是因为重写了GQA的kernel?
#1785 opened
Jun 16, 2024
22 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[Bug] 为什么pipeline输出只有一个1个token?
#1766 commented on
Jun 18, 2024 • 0 new comments -
GPTQ 和 AWQ 的推理 kernel 能否互用?
#1623 commented on
Jun 18, 2024 • 0 new comments -
Error When loading 'openbmb/MiniCPM-Llama3-V-2_5'
#1771 commented on
Jun 19, 2024 • 0 new comments -
多模态base64的接口有diff
#1779 commented on
Jun 20, 2024 • 0 new comments -
同样的 prompt 和采样参数,输出有差异
#975 commented on
Jun 21, 2024 • 0 new comments -
[Feature]- Support for the microsoft/Phi-3-vision-128k-instruct Vision Model
#1637 commented on
Jun 25, 2024 • 0 new comments -
[Feature] Grammar/structured output support
#1614 commented on
Jun 25, 2024 • 0 new comments -
[Feature] Support W4A8KV4 Quantization(QServe/QoQ)
#1587 commented on
Jun 27, 2024 • 0 new comments -
[Docs] How are multiple images handled?
#1686 commented on
Jun 28, 2024 • 0 new comments -
[Feature] V100量化推理
#1711 commented on
Jun 28, 2024 • 0 new comments -
[Feature] 想问下有打算支持GLM4V模型吗
#1713 commented on
Jul 1, 2024 • 0 new comments -
[Feature] support Nemotron-4 340B
#1784 commented on
Jul 3, 2024 • 0 new comments -
AWQ small batches optimization
#1707 commented on
Jul 3, 2024 • 0 new comments -
[Bug] lmdeploy chat model_name 对话的时候,报Aborted (core dumped)
#1706 commented on
Jul 4, 2024 • 0 new comments -
[Bug] tp=4 tp=8 no response
#1755 commented on
Jul 8, 2024 • 0 new comments -
[Feature] 请问turbomind有支持slidding window的计划么?
#1327 commented on
Jul 8, 2024 • 0 new comments -
[Bug] output diff when temperature set zero
#1688 commented on
Jul 10, 2024 • 0 new comments -
[Feature] Speculative Decoding
#1738 commented on
Jul 11, 2024 • 0 new comments -
[Docs] Guidance on setting `num_tokens_per_iter` and `max_prefill_iters` to optimal values
#1740 commented on
Jul 12, 2024 • 0 new comments -
[Benchmark] benchmarks on different cuda architecture with models of various size
#815 commented on
Jul 13, 2024 • 0 new comments -
[Bug] KV Cache INT8 校准警告:Token indices sequence length is longer than the specified maximum sequence length for this model (2874305 > 4096)
#1033 commented on
Jul 15, 2024 • 0 new comments -
support vl benchmark
#1662 commented on
Jun 19, 2024 • 0 new comments