Block or Report
Block or report vokkko
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (1)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
High-Resolution Image Synthesis with Latent Diffusion Models
✨✨Latest Advances on Multimodal Large Language Models
[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
RFQuant: Retraining-free Model Quantization via One-Shot Weight-Coupling Learning, CVPR (2024)
[ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models"
A node-based image processing GUI aimed at making chaining image processing tasks easy and customizable. Born as an AI upscaling application, chaiNNer has grown into an extremely flexible and power…
PyTorch code for our paper "2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution"
Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
Accessible large language models via k-bit quantization for PyTorch.
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
This repository contains integer operators on GPUs for PyTorch.
dabnn is an accelerated binary neural networks inference framework for mobile platform
Repository for Correlation Aware Prune (NeurIPS23) source and experimental code
This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.
[ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
CVPR2024 - Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary
[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
[CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
ICLR2024: LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection.
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.