Lists (17)
Sort Oldest
Stars
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
【A common used C++ DAG framework】 一个通用的、无三方依赖的、跨平台的、收录于awesome-cpp的、基于流图的并行计算框架。欢迎star & fork & 交流
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
Stable Diffusion and Flux in pure C/C++
使用Android手机的CPU推理stable diffusion
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Official inference repo for FLUX.1 models
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
A universal Stable-Diffusion toolbox
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.
how to optimize some algorithm in cuda.
Ongoing research training transformer models at scale
VideoSys: An easy and efficient system for video generation
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
LightSeq: A High Performance Library for Sequence Processing and Generation
The Project of the Model Deployment course on ShenLan College
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
This is a framework to evaluate your stable diffusion model