Trending

See what the GitHub community is most excited about this month.

NVIDIA / CUDALibrarySamples

CUDA Library Samples

Cuda 1,329 288 Built by

103 stars this month

NVIDIA / nccl-tests

NCCL Tests

Cuda 702 214 Built by

38 stars this month

HigherOrderCO / HVM

A massively parallel, optimal functional runtime in Rust

Cuda 10,033 377 Built by

2,938 stars this month

nerfstudio-project / gsplat

CUDA accelerated rasterization of gaussian splatting

Cuda 912 94 Built by

89 stars this month

usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 133 12 Built by

59 stars this month

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 708 56 Built by

84 stars this month

brucefan1983 / GPUMD

Graphics Processing Units Molecular Dynamics

Cuda 395 108 Built by

44 stars this month

mit-han-lab / torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Cuda 1,131 126 Built by

18 stars this month

DefTruth / CUDA-Learn-Notes

🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Cuda 625 68 Built by

108 stars this month

NVIDIA / cuda-checkpoint

CUDA checkpoint and restore utility

Cuda 156 8 Built by

71 stars this month

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 1,065 88 Built by

76 stars this month

angli66 / simsense

A Real-Time Depth Sensor Simulator with GPU Acceleration

Cuda 15 3 Built by

1 star this month

XuezheMax / megalodon

Reference implementation of Megalodon 7B model

Cuda 395 45 Built by

44 stars this month

NVIDIA / AMGX

Distributed multigrid linear solver library on GPU

Cuda 454 134 Built by

8 stars this month

graphdeco-inria / diff-gaussian-rasterization

Cuda 675 209 Built by

54 stars this month

Dao-AILab / causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 193 39 Built by

26 stars this month

brucefan1983 / CUDA-Programming

Sample codes for my CUDA programming book

Cuda 1,393 305 Built by

44 stars this month

rapidsai / cuvs

cuVS - a library for vector search and clustering on the GPU

Cuda 105 36 Built by

24 stars this month

rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library

Cuda 1,603 290 Built by

29 stars this month

olcf / cuda-training-series

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Cuda 384 171 Built by

22 stars this month

rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Cuda 631 174 Built by

21 stars this month

sangyc10 / CUDA-code

Cuda 448 56 Built by

44 stars this month

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 429 60 Built by

13 stars this month

ashawkey / diff-gaussian-rasterization

Cuda 265 20 Built by

20 stars this month