Courtesy-Xs

傅剑寒 Courtesy-Xs

7 followers · 8 following

Achievements

Lists (7)

Sort

Beta Lists are currently in beta. Share feedback and report bugs.

Stars

8 stars written in Cuda

Clear filter

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,671 447 Updated Oct 9, 2023

NVIDIA / CUDALibrarySamples

CUDA Library Samples

Cuda 1,536 321 Updated Sep 10, 2024

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 1,459 119 Updated Sep 23, 2024

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 806 126 Updated Jul 29, 2023

66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 164 11 Updated Jun 18, 2024

wzsh / wmma_tensorcore_sample

Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)

Cuda 109 17 Updated Aug 18, 2020

njuhope / cuda_sgemm

Cuda 101 28 Updated Apr 11, 2024

uuudown / SBNN

Singular Binarized Neural Network based on GPU Bit Operations (see our SC-19 paper)

Cuda 12 4 Updated Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

傅剑寒 Courtesy-Xs

Achievements

Achievements

Block or report Courtesy-Xs

Lists (7)

AI Compiler

algo

CPP

DL FrameWork

GPU

inferece engine

Traditional Compiler

Stars

NVIDIA / cub

NVIDIA / CUDALibrarySamples

BBuf / how-to-optim-algorithm-in-cuda

Liu-xiandong / How_to_optimize_in_GPU

66RING / tiny-flash-attention

wzsh / wmma_tensorcore_sample

njuhope / cuda_sgemm

uuudown / SBNN