MagnesiumAlloy

MagnesiumAlloy

1 follower · 6 following

Stars

FMInference / DejaVu

Python 273 33 Updated Apr 2, 2024

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 7,900 406 Updated Sep 6, 2024

intel / xFasterTransformer

C++ 355 61 Updated Sep 18, 2024

shadowpa0327 / Palu

Code for Palu: Compressing KV-Cache with Low-Rank Projection

Python 42 2 Updated Sep 25, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,575 174 Updated Oct 3, 2024

FMInference / H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python 369 37 Updated Aug 1, 2024

d-matrix-ai / keyformer-llm

Python 40 4 Updated Mar 26, 2024

anilshanbhag / gpu-compression

C 14 4 Updated May 5, 2024

FMInference / FlexiGen

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,148 542 Updated Sep 27, 2024

Tyrrrz / YoutubeDownloader

Downloads videos and playlists from YouTube

C# 8,507 1,179 Updated Oct 1, 2024

vcoda / magma

Abstraction layer over Khronos Vulkan API

C++ 213 7 Updated Sep 29, 2024

UbiquitousLearning / Mandheling-DSP-Training

The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]

C 18 3 Updated Aug 4, 2022

XiaoMi / mace

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

C++ 4,924 816 Updated Jun 17, 2024

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,384 184 Updated Jul 16, 2024

escalab / SHMT

SHMT for MICRO 2023

C 64 7 Updated Feb 27, 2024

htmambo / NootedRed

Forked from ChefKissInc/NootedRed

Lilu plugin for AMD Vega iGPUs

C++ 35 3 Updated Sep 28, 2024

ChefKissInc / NootedRed

The AMD Vega iGPU support patch kext. No commercial use.

C++ 1,658 800 Updated Sep 21, 2024

lucidrains / reformer-pytorch

Reformer, the efficient Transformer, in Pytorch

Python 2,104 255 Updated Jun 21, 2023

KurtBestor / Hitomi-Downloader

🍰 Desktop utility to download images/videos/music/text from various websites, and more.

Python 21,839 2,018 Updated Apr 5, 2024

mit-han-lab / inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

C++ 191 31 Updated Apr 27, 2022

intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,122 209 Updated Sep 26, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,316 932 Updated Oct 1, 2024