leesou

Follow

🎯

Focusing

leesou

🎯

Focusing

Follow

23 followers · 18 following

Peking University

Achievements

Achievements

Stars

openpsi-project / ReaLHF

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

Python 95 4 Updated Sep 20, 2024

microsoft / vidur

A large-scale simulation framework for LLM inference

Python 242 27 Updated Aug 24, 2024

LoongServe / LoongServe

Jupyter Notebook 23 3 Updated Sep 16, 2024

ray-project / pygloo

Pygloo provides Python bindings for Gloo.

C++ 16 9 Updated May 21, 2024

facebookincubator / gloo

Collective communications library with various primitives for multi-machine training.

C++ 1,198 302 Updated Jun 26, 2024

linux-rdma / rdma-core

RDMA core userspace libraries and daemons

C 1,508 678 Updated Sep 29, 2024

casys-kaist / LLMServingSim

LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale

Python 40 2 Updated Aug 1, 2024

lucidrains / ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Python 457 27 Updated Aug 15, 2024

pytorch / extension-cpp

C++ extensions in PyTorch

Python 993 209 Updated Aug 7, 2024

PKUZHOU / NeoMem-MICRO-2024

The Artifact of NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering

18 Updated Aug 11, 2024

ImageOptim / gifski

GIF encoder based on libimagequant (pngquant). Squeezes maximum possible quality from the awful GIF format.

Rust 4,746 140 Updated Aug 31, 2024

exists-forall / striped_attention

Python 36 2 Updated Nov 10, 2023

FelixFu520 / README

A pupil in the computer world.(Felix Fu)

Jupyter Notebook 175 42 Updated Jun 12, 2024

yale-sys / prompt-cache

Modular and structured prompt caching for low-latency LLM inference

Python 48 4 Updated May 12, 2024

lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 36,558 4,512 Updated Sep 25, 2024

ChrisWu1997 / EfficientResearchWork

Efficient research work environment setup for computer science and general workflow for Deep Learning experiments

Python 119 21 Updated Dec 20, 2021

zhuzilin / ring-flash-attention

Ring attention implementation with flash attention

Python 542 41 Updated Sep 20, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

1,064 22 Updated Jul 31, 2024

zhuohan123 / terapipe

Python 63 5 Updated May 4, 2021

ServerlessLLM / ServerlessLLM

Scalable and Efficient Serverless Deployment for Large AI Models.

Python 191 19 Updated Sep 30, 2024

Hsword / SpotServe

SpotServe: Serving Generative Large Language Models on Preemptible Instances

93 8 Updated Feb 22, 2024

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,368 192 Updated Sep 30, 2024

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 5,364 388 Updated Sep 30, 2024

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 2,590 261 Updated Sep 29, 2024

ultraembedded / cores

Various HDL (Verilog) IP Cores

Verilog 687 210 Updated Jul 1, 2021

PrincetonUniversity / LLMCompass

Python 65 16 Updated Jul 1, 2024

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 184 26 Updated Sep 12, 2024

AlibabaPAI / llumnix

Efficient and easy multi-instance LLM serving

Python 131 10 Updated Sep 29, 2024

ictar / python-doc

translate python documents to Chinese for convenient reference 简而言之，这里用来存放那些Python文档君们，并且尽力将其翻译成中文~~

1,932 666 Updated May 17, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,567 173 Updated Sep 27, 2024