Skip to content
View gesanqiu's full-sized avatar
  • Beijing, China
Block or Report

Block or report gesanqiu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

QQQ is an innovative and hardware-optimized W4A8 quantization solution.

Python 32 2 Updated Jul 24, 2024

🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast…

C++ 399 52 Updated Jul 24, 2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Python 355 12 Updated Jul 19, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 863 81 Updated Jul 27, 2024

Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function independently without continuous internet access.

57 5 Updated Mar 23, 2024

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 7,827 1,425 Updated Jul 27, 2024

SGLang is yet another fast serving framework for large language models and vision language models.

Python 3,227 201 Updated Jul 27, 2024

Universal LLM Deployment Engine with ML Compilation

Python 17,928 1,424 Updated Jul 26, 2024

Structured Text Generation

Python 7,353 377 Updated Jul 27, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 8 4 Updated Jun 18, 2024

An easy-to-use package for implementing SmoothQuant for LLMs

Python 68 4 Updated May 18, 2024

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 629 49 Updated Jul 24, 2024

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 3,537 317 Updated Jul 27, 2024

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,124 182 Updated Jul 25, 2024

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

Python 7,019 571 Updated Apr 30, 2024

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 35,912 4,412 Updated Jul 25, 2024

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Python 9,111 711 Updated Jul 16, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 23,743 3,409 Updated Jul 27, 2024

This repo includes ChatGPT prompt curation to use ChatGPT better.

HTML 107,557 14,726 Updated Jul 18, 2024

SRS is a simple, high-efficiency, real-time video server supporting RTMP, WebRTC, HLS, HTTP-FLV, SRT, MPEG-DASH, and GB28181.

C++ 24,935 5,298 Updated Jul 27, 2024

Modern C++ ORM library

C++ 214 23 Updated Jul 26, 2024

Theoretical solutions for LeetCode problems.

C++ 376 43 Updated Feb 25, 2024

7 days golang programs from scratch (web framework Gee, distributed cache GeeCache, object relational mapping ORM framework GeeORM, rpc framework GeeRPC etc) 7天用Go动手写/从零实现系列

Go 15,099 2,407 Updated Jul 19, 2024

high performance coding with golang(Go 语言高性能编程,Go 语言陷阱,Gotchas,Traps)

Go 3,745 417 Updated Nov 3, 2022

A C++ header-only HTTP/HTTPS server and client library

C++ 12,410 2,214 Updated Jul 2, 2024

Real-time object detection with YOLOv5 and TensorRT

C++ 105 22 Updated Feb 1, 2022

Android OpenGL ES 3.0 从入门到精通系统性学习教程

C++ 2,897 809 Updated Jun 8, 2024

Simple Functional Programming of C++ from Scratch 从零开始的简单函数式C++ ZEROから始める使いやすい関数型プログラミング

C++ 103 11 Updated Dec 30, 2019

A cheatsheet of modern C++ language and library features.

19,278 2,045 Updated Oct 25, 2023
Next