Block or Report
Block or report vokkko
Report abuse
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuse-
Quest Public
Forked from mit-han-lab/Quest[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Cuda UpdatedJun 18, 2024 -
auto-round Public
Forked from intel/auto-roundSOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
Python Apache License 2.0 UpdatedJun 11, 2024 -
EfficientDM Public
Forked from ThisisBillhe/EfficientDM[ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models"
Jupyter Notebook MIT License UpdatedJun 4, 2024 -
Awesome-Efficient-LLM Public
Forked from horseee/Awesome-Efficient-LLMA curated list for Efficient Large Language Models
Python UpdatedApr 17, 2024