GuangyanZhang

LMM GuangyanZhang

7 followers · 11 following

Achievements

Stars

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 4,342 390 Updated Sep 28, 2024

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 11,714 2,447 Updated Oct 4, 2024

NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream d…

Python 459 30 Updated Oct 3, 2024

Vahe1994 / AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,142 173 Updated Sep 10, 2024

Cornell-RelaxML / QuIP

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Python 341 31 Updated Feb 24, 2024

efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 263 21 Updated Jul 2, 2024

ModelTC / llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Python 240 27 Updated Sep 27, 2024

OpenPPL / ppl.nn

A primitive library for neural network

C++ 1,277 215 Updated Aug 18, 2024

jy-yuan / KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 217 21 Updated Aug 27, 2024

OpenDriveLab / Birds-eye-view-Perception

[IEEE T-PAMI] Awesome BEV perception research and cookbook for all level audience in autonomous diriving

Python 1,180 100 Updated Jan 6, 2024

Ascend / AscendSpeed

Python 70 5 Updated Dec 15, 2023

youngyangyang04 / leetcode-master

《代码随想录》LeetCode 刷题攻略：200道经典题目刷题顺序，共60w字的详细图解，视频难点剖析，50余张思维导图，支持C++，Java，Python，Go，JavaScript等多语言版本，从此算法学习不再迷茫！🔥🔥 来看看，你会发现相见恨晚！🚀

Shell 50,937 11,369 Updated Oct 3, 2024

csitfun / llm

大模型入门

16 5 Updated Mar 16, 2024

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 1,677 202 Updated Sep 21, 2024

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,854 309 Updated Oct 4, 2024