You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
Describe the bug
When running the repo example I choose: YurtsAI/Meta-Llama-3-8B-Instruct-AWQ model
and traderpedroso/llama3-8b-lora this adapter.
I know the adapter was trained on the 4bit base model. Im not sure if this works with awq
self.engine = Engine(model_path=model_path,
File "/home/merlin/code/kreacher/venv/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 153, in __init__
_paging_adapters(adapters,
File "/home/merlin/code/kreacher/venv/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 68, in _paging_adapters
model_agent.paging_adapters(weight_maps)
File "/home/merlin/code/kreacher/venv/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 715, in paging_adapters
weight_map.cache_adapter(lora_linears, cpu_caches)
File "/home/merlin/code/kreacher/venv/lib/python3.10/site-packages/lmdeploy/pytorch/adapter/adapter.py", line 226, in cache_adapter
assert len(lora_linears) == len(caches), (
AssertionError: len(lora_linears) == len(caches)
If I comment out len(lora_linears) == len(caches) then the adapter merges... but im not sure if that its supposed to work like that or not.
Let me know if I can help out. Id be happy to test, im also capable of coding but this area is not my expertise :D . So this would mean any lora adapter should be able to mount on top of a AWQ quant model? Or do I need to fine-tune on a AWQ model. Seems like the lora adapter would just be mounted on top.
PyTorchEngine use AwqLoraLinear, Adapters can be applied on awq model without fine-tune. Base linear would be forward with w4a16 support while adapters in fp16.
Checklist
Describe the bug
When running the repo example I choose:
YurtsAI/Meta-Llama-3-8B-Instruct-AWQ
modeland
traderpedroso/llama3-8b-lora
this adapter.I know the adapter was trained on the 4bit base model. Im not sure if this works with awq
If I comment out
len(lora_linears) == len(caches)
then the adapter merges... but im not sure if that its supposed to work like that or not.Reproduction
My script:
Environment
Error traceback
No response
The text was updated successfully, but these errors were encountered: