hwchen2017

Follow

Hongwei Chen hwchen2017

Follow

Ph.D. student in physics; machine learning for physics; high-performance computing

6 followers · 23 following

Northeastern University

Stars

microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 362 29 Updated Oct 3, 2024

TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

C++ 134 9 Updated Oct 4, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 1,516 58 Updated Oct 3, 2024

weishengying / Cute_exercise

Cute_exercise

Cuda 5 2 Updated Jul 30, 2024

weishengying / tiny-flash-attention

使用 cutlass 实现 flash-attention 精简版，具有教学意义

Cuda 29 1 Updated Aug 12, 2024

weishengying / cutlass_flash_atten_fp8

使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention

Cuda 46 3 Updated Aug 12, 2024

Bruce-Lee-LY / flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

C++ 20 2 Updated Sep 7, 2024

ParCoreLab / multi-GPU-comm-bench

Cuda 3 1 Updated Jun 13, 2024

leloykun / flash-hyperbolic-attention-minimal

Forked from tspeterkim/flash-attention-minimal

Flash Hyperbolic Attention in ~[...] lines of CUDA

Cuda 12 1 Updated Apr 16, 2024

tspeterkim / mixed-precision-from-scratch

Mixed precision training from scratch with Tensors and CUDA

Python 18 1 Updated May 14, 2024

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 572 50 Updated Apr 7, 2024

Danqi7 / flash-attention-cuda

CPSC524 Final Project

Cuda 3 Updated Dec 16, 2023

ZRayZzz / flash-attention-v100

Cuda 14 2 Updated Feb 19, 2024

kilianhae / FlashAttention.C

Flash Attention in raw Cuda C beating PyTorch

Cuda 12 Updated May 14, 2024

aredden / torch-cublas-hgemm

PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu

Cuda 25 1 Updated Aug 26, 2024

ShaYeBuHui01 / flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

C++ 13 17 Updated Aug 31, 2023

nicolaswilde / cuda-tensorcore-hgemm

Cuda 101 20 Updated Aug 25, 2022

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,216 115 Updated Oct 5, 2024

PeaBrane / mamba-tiny

Forked from johnma2006/mamba-minimal

Simple, minimal implementation of the Mamba SSM in one pytorch file. More efficient than using for loops, but probably less efficient than using associative scans

Python 95 14 Updated Apr 19, 2024

Zhenye-Na / CSAPP-Labs

💻 Computer Systems: A Programmer's Perspective, Lab Assignments Solutions

C 185 85 Updated Oct 3, 2019

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 1,479 122 Updated Oct 5, 2024

Qwesh157 / conv_op_optimization

This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.

C++ 16 Updated Sep 15, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 13,640 1,250 Updated Oct 5, 2024

HuaizhengZhang / AI-System-School

🚀 AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Vi…

2,664 305 Updated Aug 14, 2024

allenyllee / HPC-book-and-course

Mirror from https://bitbucket.org/VictorEijkhout/hpc-book-and-course/ by https://githgmirror.com/

TeX 10 4 Updated Aug 18, 2020

VictorEijkhout / TheArtofHPC_pdfs

All pdfs of Victor Eijkhout's Art of HPC books and courses

498 53 Updated Apr 12, 2024

Kobzol / hardware-effects

Demonstration of various hardware effects.

C++ 2,828 159 Updated Feb 29, 2024

QianyanTech / NBAssembler

Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.

Python 66 8 Updated Feb 23, 2023

ddbourgin / numpy-ml

Machine learning, in numpy

Python 15,304 3,711 Updated Oct 29, 2023

jackd / jju

Jack's Jax Utilities

Python 6 1 Updated Mar 26, 2022