Courtesy-Xs

Follow

傅剑寒 Courtesy-Xs

Follow

7 followers · 8 following

Achievements

Achievements

Lists (7)

Sort

AI Compiler

algo

CPP

DL FrameWork

GPU

19 repositories

inferece engine

Traditional Compiler

Beta Lists are currently in beta. Share feedback and report bugs.

Stars

CadenCao / vllm-qwen1.5-StreamChat

用VLLM框架部署千问1.5并进行流式输出

Python 22 1 Updated Apr 17, 2024

wdndev / llm_interview_note

主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题

HTML 2,728 324 Updated Aug 19, 2024

Yuanshi9815 / Video-Infinity

Video-Infinity generates long videos quickly using multiple GPUs without extra training.

Python 158 15 Updated Aug 4, 2024

kjfx / kjfx

2024机场推荐

507 42 Updated Sep 8, 2024

ROCm / hipify_torch

Python 18 10 Updated Jun 15, 2024

BrightXiaoHan / CMakeTutorial

CMake中文实战教程

C++ 1,434 279 Updated Aug 30, 2023

66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 164 11 Updated Jun 18, 2024

carlushuang / gcnasm

amdgpu example code in hip/asm

Assembly 13 12 Updated Sep 18, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 65,333 9,362 Updated Sep 23, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 27,227 3,993 Updated Sep 23, 2024

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 38,634 4,330 Updated Sep 19, 2024

alibaba / BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 795 160 Updated Aug 28, 2024

lixiuhong / implicit_gemm_convolution

C 14 3 Updated May 28, 2019

afatcoder / LeetcodeTop

汇总各大互联网公司容易考察的高频leetcode题🔥

18,595 2,697 Updated Mar 13, 2024

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 1,459 119 Updated Sep 23, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 13,504 1,238 Updated Sep 23, 2024

zouxiaohang / TinySTL

TinySTL is a subset of STL(cut some containers and algorithms) and also a superset of STL(add some other containers and algorithms)

C++ 2,283 626 Updated Oct 27, 2018

Alinshans / MyTinySTL

Achieve a tiny STL in C++11

C++ 11,283 3,220 Updated Jul 24, 2024

mindspore-ai / akg

AKG (Auto Kernel Generator) is an optimizer for operators in Deep Learning Networks, which provides the ability to automatically fuse ops with specific patterns.

Python 212 38 Updated Mar 21, 2024

pytorch / glow

Compiler for Neural Network hardware accelerators

C++ 3,206 688 Updated May 11, 2024

facebookresearch / TensorComprehensions

A domain specific language to express machine learning workloads.

C++ 1,757 211 Updated Apr 28, 2023

plaidml / plaidml

PlaidML is a framework for making deep learning work everywhere.

C++ 4,584 400 Updated Jul 23, 2023

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 28,091 11,597 Updated Sep 23, 2024

Jokeren / Awesome-GPU

Awesome resources for GPUs

466 47 Updated Jul 1, 2023

triton-lang / triton

Development repository for the Triton language and compiler

C++ 12,818 1,548 Updated Sep 23, 2024

StrongSpoon / tvm.schedule

examples for tvm schedule API

Python 97 36 Updated Jun 12, 2023

uploadcare / pillow-simd

Forked from python-pillow/Pillow

The friendly PIL fork

Python 2,152 85 Updated Sep 23, 2024

CVCUDA / CV-CUDA

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

C++ 2,329 213 Updated Sep 19, 2024

daadaada / turingas

Assembler for NVIDIA Volta and Turing GPUs

Python 196 41 Updated Jan 13, 2022

NVIDIA / thrust

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 4,911 758 Updated Feb 8, 2024