[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Python 240 27 Updated Sep 27, 2024

ThisisBillhe / torch_quantizer

torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.

C++ 17 Updated Mar 29, 2024

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,176 486 Updated Oct 4, 2024

triton-lang / triton

Development repository for the Triton language and compiler

C++ 12,921 1,568 Updated Oct 4, 2024

leimao / CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

Cuda 125 9 Updated Jul 19, 2024

Tony-Tan / CUDA_Freshman

Cuda 2,129 430 Updated Jan 16, 2024

OpenPPL / ppl.llm.kernel.cuda

C++ 134 24 Updated Jul 19, 2024

tpoisonooo / how-to-optimize-gemm

row-major matmul optimization

C++ 586 78 Updated Sep 9, 2023

lucidrains / rectified-flow-pytorch

Implementation of rectified flow and some of its followup research / improvements in Pytorch

Python 161 2 Updated Aug 21, 2024

pytorch / extension-cpp

C++ extensions in PyTorch

Python 995 209 Updated Aug 7, 2024

DefTruth / CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,226 133 Updated Oct 4, 2024

feizc / DiT-MoE

Scaling Diffusion Transformers with Mixture of Experts

Python 187 8 Updated Sep 9, 2024

Karine-Huang / T2I-CompBench

[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Python 189 6 Updated Aug 21, 2024

linzhiqiu / t2v_metrics

Evaluating text-to-image/video/3D models with VQAScore

Python 186 17 Updated Sep 9, 2024

MrYxJ / calculate-flops.pytorch

The calflops is designed to calculate FLOPs、MACs and Parameters in all various neural networks, such as Linear、 CNN、 RNN、 GCN、Transformer(Bert、LlaMA etc Large Language Model)

Python 508 16 Updated Jun 27, 2024

PaddleJitLab / CUDATutorial

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 147 24 Updated Oct 4, 2024

tianchen A-suozhang

Highlights

Organizations

Starred repositories

Awesome Lists