AWQ small batches optimization #1707

zhyncs · 2024-06-04T10:52:02Z

Motivation

Recently, Tsinghua University proposed a survey related to LLM inference acceleration, comparing TensorRT LLM and LMDeploy under AWQ. From the results, LMDeploy has a higher speed-ups for large batches, while TensorRT LLM has a higher speed-ups for small batches. Previously, LMDeploy had been optimizing throughput and internet companies still pay attention to throughput under limited latency during actual online serving. Therefore, optimizing the latency acceleration ratio for small batches might also be meaningful. If interested, you may take a look. Cheers. @lzhangzz @irexyc @lvhan028 @grimoire

https://arxiv.org/pdf/2404.14294

Related resources

No response

Additional context

No response

lzhangzz · 2024-06-04T15:14:56Z

The second iteration of our mixed precision GEMM will be ready soon, stay tuned.

zhyncs · 2024-06-04T15:16:14Z

The second iteration of our mixed precision GEMM will be ready soon, stay tuned.

Cheers!

coolhok · 2024-06-11T06:46:23Z

how about suppourt group_size == 64，so can enable awq int4 model TP = 2。

zhyncs · 2024-07-03T09:32:20Z

suppourt group_size == 64

This requirement has also been mentioned several times in the discussion of other issues, and @lzhangzz ‘s expectation was to support it around July. Perhaps it could also change, after all, plans cannot keep up with changes, and there are many higher priority matters at hand now.

lvhan028 assigned lzhangzz Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWQ small batches optimization #1707

AWQ small batches optimization #1707

zhyncs commented Jun 4, 2024

lzhangzz commented Jun 4, 2024

zhyncs commented Jun 4, 2024

coolhok commented Jun 11, 2024

zhyncs commented Jul 3, 2024

AWQ small batches optimization #1707

AWQ small batches optimization #1707

Comments

zhyncs commented Jun 4, 2024

Motivation

Related resources

Additional context

lzhangzz commented Jun 4, 2024

zhyncs commented Jun 4, 2024

coolhok commented Jun 11, 2024

zhyncs commented Jul 3, 2024