📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

1,933 134 Updated Jul 8, 2024

kuleshov-group / llmtools

Finetuning Large Language Models on One Consumer GPU in Under 4 Bits

Python 687 74 Updated May 25, 2024

huggingface / text-generation-inference

Large Language Model Text Generation Inference

Python 8,362 948 Updated Jul 10, 2024

openppl-public / ppl.llm.kernel.cuda

C++ 126 23 Updated Jul 10, 2024

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 17,756 1,411 Updated Jul 8, 2024

openppl-public / ppl.pmx

Python 50 16 Updated Jul 10, 2024

mindspore-lab / mindnlp

Easy-to-use and high-performance NLP and LLM framework based on MindSpore, compatible with models and datasets of 🤗Huggingface.

Python 576 147 Updated Jul 10, 2024

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 8,015 565 Updated Jul 9, 2024

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 21,521 2,335 Updated Jul 10, 2024

Tencent / HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Python 2,728 206 Updated Jul 10, 2024

state-spaces / mamba

Mamba SSM architecture

Python 11,603 950 Updated Jul 3, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,433 802 Updated Jul 10, 2024

microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,760 164 Updated Jul 10, 2024

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,074 178 Updated Jul 9, 2024

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 7,647 407 Updated Jul 1, 2024

mindspore-courses / step_into_llm

MindSpore online courses: Step into LLM

Jupyter Notebook 381 82 Updated Jun 14, 2024

karpathy / LLM101n

LLM101n: Let's build a Storyteller

15,304 731 Updated Jun 28, 2024

siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

Python 1,441 85 Updated Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

javey-q

Block or report javey-q

LLM

mosaicml / llm-foundry

vllm-project / vllm

km1994 / LLMs_interview_notes

DefTruth / Awesome-LLM-Inference