Skip to content
View leesou's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report leesou

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

Python 95 4 Updated Sep 20, 2024

A large-scale simulation framework for LLM inference

Python 242 27 Updated Aug 24, 2024
Jupyter Notebook 23 3 Updated Sep 16, 2024

Pygloo provides Python bindings for Gloo.

C++ 16 9 Updated May 21, 2024

Collective communications library with various primitives for multi-machine training.

C++ 1,198 302 Updated Jun 26, 2024

RDMA core userspace libraries and daemons

C 1,508 678 Updated Sep 29, 2024

LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale

Python 40 2 Updated Aug 1, 2024

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Python 457 27 Updated Aug 15, 2024

C++ extensions in PyTorch

Python 993 209 Updated Aug 7, 2024

The Artifact of NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering

18 Updated Aug 11, 2024

GIF encoder based on libimagequant (pngquant). Squeezes maximum possible quality from the awful GIF format.

Rust 4,746 140 Updated Aug 31, 2024

A pupil in the computer world.(Felix Fu)

Jupyter Notebook 175 42 Updated Jun 12, 2024

Modular and structured prompt caching for low-latency LLM inference

Python 48 4 Updated May 12, 2024

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 36,558 4,512 Updated Sep 25, 2024

Efficient research work environment setup for computer science and general workflow for Deep Learning experiments

Python 119 21 Updated Dec 20, 2021

Ring attention implementation with flash attention

Python 542 41 Updated Sep 20, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

1,064 22 Updated Jul 31, 2024
Python 63 5 Updated May 4, 2021

Scalable and Efficient Serverless Deployment for Large AI Models.

Python 191 19 Updated Sep 30, 2024

SpotServe: Serving Generative Large Language Models on Preemptible Instances

93 8 Updated Feb 22, 2024

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,368 192 Updated Sep 30, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 5,364 388 Updated Sep 30, 2024

Material for gpu-mode lectures

Jupyter Notebook 2,590 261 Updated Sep 29, 2024

Various HDL (Verilog) IP Cores

Verilog 687 210 Updated Jul 1, 2021

A low-latency & high-throughput serving engine for LLMs

Python 184 26 Updated Sep 12, 2024

Efficient and easy multi-instance LLM serving

Python 131 10 Updated Sep 29, 2024

translate python documents to Chinese for convenient reference 简而言之,这里用来存放那些Python文档君们,并且尽力将其翻译成中文~~

1,932 666 Updated May 17, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,567 173 Updated Sep 27, 2024
Next