Skip to content
View shmxce's full-sized avatar

Block or report shmxce

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Standalone Flash Attention v2 kernel without libtorch dependency

C++ 96 13 Updated Sep 10, 2024

Fast and memory-efficient exact attention

Python 13,633 1,249 Updated Oct 4, 2024

🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Cuda 10 1 Updated Jan 25, 2024

j(judge) - A simple OJ tool.

C++ 1 Updated Sep 5, 2024

`std::execution`, the proposed C++ framework for asynchronous and parallel programming.

C++ 1,534 158 Updated Sep 20, 2024

Material for gpu-mode lectures

Jupyter Notebook 2,623 263 Updated Oct 1, 2024

how to optimize some algorithm in cuda.

Cuda 1,479 122 Updated Oct 5, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 34,983 4,062 Updated Oct 4, 2024

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Cuda 562 218 Updated Aug 19, 2024

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 269 44 Updated Nov 28, 2021

BLISlab: A Sandbox for Optimizing GEMM

C 470 100 Updated Jun 17, 2021

本项目旨在分享大模型相关技术原理以及实战经验。

HTML 9,416 920 Updated Sep 22, 2024

A generative speech model for daily dialogue.

Python 31,186 3,387 Updated Sep 21, 2024

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 82,670 22,261 Updated Oct 5, 2024

🦄 🎃 👻 V2Ray 路由规则文件加强版,可代替 V2Ray 官方 geoip.dat 和 geosite.dat,适用于 V2Ray、Xray-core、mihomo(Clash-Meta)、hysteria、Trojan-Go 和 leaf。Enhanced edition of V2Ray rules dat files, applicable to V2Ray, Xray-core…

14,750 1,727 Updated Oct 4, 2024

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,229 133 Updated Oct 5, 2024

Matplot++: A C++ Graphics Library for Data Visualization 📊🗾

C++ 4,236 325 Updated Sep 24, 2024

ISO/IEC JTC1 SC22 WG21 paper scheduling and management

Perl 636 18 Updated May 4, 2024

LLM training in simple, raw C/CUDA

Cuda 23,716 2,647 Updated Oct 2, 2024

distributed system, db, storage, computing jobs hub

169 2 Updated Nov 28, 2023

Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs,…

C 66,514 23,743 Updated Oct 2, 2024

MySQL Server, the world's most popular open source database, and MySQL Cluster, a real-time, open source transactional database.

C++ 10,762 3,861 Updated Aug 14, 2024

Flare是广泛投产于腾讯广告后台的现代化C++开发框架,包含了基础库、RPC、各种客户端等。主要特点为易用性强、长尾延迟低。

C++ 1,315 199 Updated Jun 4, 2024

A fast and lightweight fully featured OCI runtime and C library for running containers

C 2,999 305 Updated Sep 26, 2024

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 28,307 11,689 Updated Oct 5, 2024

Rime 配置:雾凇拼音 | 长期维护的简体词库

Lua 9,125 615 Updated Sep 25, 2024
Next