-
Tsinghua University
- Beijing China
- tianchen-zhao.info
- @A_Suozhang98
Highlights
- Pro
Starred repositories
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
Fast Hadamard transform in CUDA, with a PyTorch interface
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Code for QuaRot, an end-to-end 4-bit inference of large language models.
Evaluating dynamics capability of T2V generation models with DEVIL protocols.
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
The homepage of OneBit model quantization framework.
FlashInfer: Kernel Library for LLM Serving
Efficient Triton Kernels for LLM Training
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
TerDiT: Ternary Diffusion Models with Transformers
Triton implementation of FlashAttention2 that adds Custom Masks.
Official inference repo for FLUX.1 models
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Development repository for the Triton language and compiler
CUDA Matrix Multiplication Optimization
Implementation of rectified flow and some of its followup research / improvements in Pytorch
đ Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
Scaling Diffusion Transformers with Mixture of Experts
[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
Evaluating text-to-image/video/3D models with VQAScore
The calflops is designed to calculate FLOPsăMACs and Parameters in all various neural networks, such as Lineară CNNă RNNă GCNăTransformer(BertăLlaMA etc Large Language Model)
A self-learning tutorail for CUDA High Performance Programing.