🐟
Touch Fish EveryDay
Phd student at NICS EFC, EE Dept. Tsinghua University. My major research interest is Efficient deep learning.
-
Tsinghua University
- Beijing China
- tianchen-zhao.info
- @A_Suozhang98
Highlights
- Pro
Starred repositories
8
stars
written in Cuda
Clear filter
🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
FlashInfer: Kernel Library for LLM Serving
Instructions, Docker images, and examples for Nsight Compute and Nsight Systems
CUDA Matrix Multiplication Optimization
PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity