Skip to content
View Courtesy-Xs's full-sized avatar

Block or report Courtesy-Xs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

用VLLM框架部署千问1.5并进行流式输出

Python 22 1 Updated Apr 17, 2024

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 2,728 324 Updated Aug 19, 2024

Video-Infinity generates long videos quickly using multiple GPUs without extra training.

Python 158 15 Updated Aug 4, 2024

2024机场推荐

507 42 Updated Sep 8, 2024
Python 18 10 Updated Jun 15, 2024

CMake中文实战教程

C++ 1,434 279 Updated Aug 30, 2023

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 164 11 Updated Jun 18, 2024

amdgpu example code in hip/asm

Assembly 13 12 Updated Sep 18, 2024

LLM inference in C/C++

C++ 65,333 9,362 Updated Sep 23, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 27,227 3,993 Updated Sep 23, 2024

Making large AI models cheaper, faster and more accessible

Python 38,634 4,330 Updated Sep 19, 2024

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 795 160 Updated Aug 28, 2024

汇总各大互联网公司容易考察的高频leetcode题🔥

18,595 2,697 Updated Mar 13, 2024

how to optimize some algorithm in cuda.

Cuda 1,459 119 Updated Sep 23, 2024

Fast and memory-efficient exact attention

Python 13,504 1,238 Updated Sep 23, 2024

TinySTL is a subset of STL(cut some containers and algorithms) and also a superset of STL(add some other containers and algorithms)

C++ 2,283 626 Updated Oct 27, 2018

Achieve a tiny STL in C++11

C++ 11,283 3,220 Updated Jul 24, 2024

AKG (Auto Kernel Generator) is an optimizer for operators in Deep Learning Networks, which provides the ability to automatically fuse ops with specific patterns.

Python 212 38 Updated Mar 21, 2024

Compiler for Neural Network hardware accelerators

C++ 3,206 688 Updated May 11, 2024

A domain specific language to express machine learning workloads.

C++ 1,757 211 Updated Apr 28, 2023

PlaidML is a framework for making deep learning work everywhere.

C++ 4,584 400 Updated Jul 23, 2023

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 28,091 11,597 Updated Sep 23, 2024

Awesome resources for GPUs

466 47 Updated Jul 1, 2023

Development repository for the Triton language and compiler

C++ 12,818 1,548 Updated Sep 23, 2024

examples for tvm schedule API

Python 97 36 Updated Jun 12, 2023

The friendly PIL fork

Python 2,152 85 Updated Sep 23, 2024

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

C++ 2,329 213 Updated Sep 19, 2024

Assembler for NVIDIA Volta and Turing GPUs

Python 196 41 Updated Jan 13, 2022

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 4,911 758 Updated Feb 8, 2024
Next