Skip to content
View MagnesiumAlloy's full-sized avatar

Block or report MagnesiumAlloy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 273 33 Updated Apr 2, 2024

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 7,900 406 Updated Sep 6, 2024

Code for Palu: Compressing KV-Cache with Low-Rank Projection

Python 42 2 Updated Sep 25, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,575 174 Updated Oct 3, 2024

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python 369 37 Updated Aug 1, 2024
Python 40 4 Updated Mar 26, 2024

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,148 542 Updated Sep 27, 2024

Downloads videos and playlists from YouTube

C# 8,507 1,179 Updated Oct 1, 2024

Abstraction layer over Khronos Vulkan API

C++ 213 7 Updated Sep 29, 2024

The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]

C 18 3 Updated Aug 4, 2022

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

C++ 4,924 816 Updated Jun 17, 2024

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,384 184 Updated Jul 16, 2024

SHMT for MICRO 2023

C 64 7 Updated Feb 27, 2024

Lilu plugin for AMD Vega iGPUs

C++ 35 3 Updated Sep 28, 2024

The AMD Vega iGPU support patch kext. No commercial use.

C++ 1,658 800 Updated Sep 21, 2024

Reformer, the efficient Transformer, in Pytorch

Python 2,104 255 Updated Jun 21, 2023

🍰 Desktop utility to download images/videos/music/text from various websites, and more.

Python 21,839 2,018 Updated Apr 5, 2024

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

C++ 191 31 Updated Apr 27, 2022

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,122 209 Updated Sep 26, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,316 932 Updated Oct 1, 2024

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,333 295 Updated Jul 14, 2024

The official gpt4free repository | various collection of powerful language models

Python 60,186 13,229 Updated Sep 29, 2024

直播源相关资源汇总 📺 💯 IPTV、M3U —— 勤洗手、戴口罩,祝愿所有人百毒不侵

26,202 3,191 Updated Dec 24, 2023

A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)

9,337 714 Updated May 31, 2024

Stable Diffusion and Flux in pure C/C++

C++ 3,315 279 Updated Sep 2, 2024

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Python 40,465 5,193 Updated Jun 27, 2024

Bash script for installing V2Ray in operating systems such as Debian / CentOS / Fedora / openSUSE that support systemd

Shell 6,152 1,437 Updated Feb 9, 2024
C++ 74 12 Updated May 28, 2023
Next