Skip to content
View A-suozhang's full-sized avatar
🐟
Touch Fish EveryDay
🐟
Touch Fish EveryDay

Highlights

  • Pro

Organizations

@thu-nics

Block or report A-suozhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

8 stars written in Cuda
Clear filter

Squeeze-and-Excitation Networks

Cuda 3,375 837 Updated Feb 25, 2019

Tile primitives for speedy kernels

Cuda 1,517 58 Updated Oct 3, 2024

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,226 133 Updated Oct 4, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,215 114 Updated Oct 3, 2024

Instructions, Docker images, and examples for Nsight Compute and Nsight Systems

Cuda 126 18 Updated May 19, 2020

CUDA Matrix Multiplication Optimization

Cuda 125 9 Updated Jul 19, 2024

PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity

Cuda 95 27 Updated Sep 30, 2024