Skip to content
View Courtesy-Xs's full-sized avatar

Block or report Courtesy-Xs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

GPU

19 repositories

how to optimize some algorithm in cuda.

Cuda 1,459 119 Updated Sep 23, 2024

Fast and memory-efficient exact attention

Python 13,504 1,239 Updated Sep 23, 2024

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 806 126 Updated Jul 29, 2023
Cuda 101 28 Updated Apr 11, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 5,399 909 Updated Sep 19, 2024

Convolutional Neural Network with CUDA (MNIST 99.23%)

C++ 171 38 Updated Apr 4, 2022

Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)

Cuda 109 17 Updated Aug 18, 2020

Singular Binarized Neural Network based on GPU Bit Operations (see our SC-19 paper)

Cuda 12 4 Updated Dec 9, 2020

An unofficial cuda assembler, for all generations of SASS, hopefully :)

Python 389 69 Updated Apr 20, 2023

Source code examples from the Parallel Forall Blog

HTML 1,224 632 Updated Jul 23, 2024

Assembler for NVIDIA Maxwell architecture

Sass 942 160 Updated Jan 3, 2023

Awesome resources for GPUs

466 47 Updated Jul 1, 2023

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

C++ 2,330 213 Updated Sep 19, 2024

Assembler for NVIDIA Volta and Turing GPUs

Python 196 41 Updated Jan 13, 2022

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 4,911 758 Updated Feb 8, 2024

CUDA Library Samples

Cuda 1,536 321 Updated Sep 10, 2024

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,671 447 Updated Oct 9, 2023

Transformer related optimization, including BERT, GPT

C++ 5,789 883 Updated Mar 27, 2024