Block or Report
Block or report gesanqiu
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
QQQ is an innovative and hardware-optimized W4A8 quantization solution.
🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast…
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
FlashInfer: Kernel Library for LLM Serving
Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function independently without continuous internet access.
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
SGLang is yet another fast serving framework for large language models and vision language models.
Universal LLM Deployment Engine with ML Compilation
AniZpZ / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
An easy-to-use package for implementing SmoothQuant for LLMs
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
A high-throughput and memory-efficient inference and serving engine for LLMs
This repo includes ChatGPT prompt curation to use ChatGPT better.
SRS is a simple, high-efficiency, real-time video server supporting RTMP, WebRTC, HLS, HTTP-FLV, SRT, MPEG-DASH, and GB28181.
Theoretical solutions for LeetCode problems.
7 days golang programs from scratch (web framework Gee, distributed cache GeeCache, object relational mapping ORM framework GeeORM, rpc framework GeeRPC etc) 7天用Go动手写/从零实现系列
high performance coding with golang(Go 语言高性能编程,Go 语言陷阱,Gotchas,Traps)
A C++ header-only HTTP/HTTPS server and client library
Real-time object detection with YOLOv5 and TensorRT
Android OpenGL ES 3.0 从入门到精通系统性学习教程
Simple Functional Programming of C++ from Scratch 从零开始的简单函数式C++ ZEROから始める使いやすい関数型プログラミング
A cheatsheet of modern C++ language and library features.