Skip to content
View vokkko's full-sized avatar
Block or Report

Block or report vokkko

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook 11,217 1,461 Updated Feb 29, 2024

✨✨Latest Advances on Multimodal Large Language Models

10,952 723 Updated Jul 30, 2024
Python 3 Updated Jul 22, 2024

[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.

Python 519 26 Updated Jul 14, 2024

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 229 15 Updated Jul 2, 2024

RFQuant: Retraining-free Model Quantization via One-Shot Weight-Coupling Learning, CVPR (2024)

Python 5 1 Updated Jun 17, 2024

[ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models"

Jupyter Notebook 40 3 Updated Jun 4, 2024

IJCV2023 Instance Segmentation in the Dark

Python 71 7 Updated Mar 13, 2024

A node-based image processing GUI aimed at making chaining image processing tasks easy and customizable. Born as an AI upscaling application, chaiNNer has grown into an extremely flexible and power…

Python 4,336 273 Updated Jul 18, 2024

PyTorch code for our paper "2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution"

13 Updated Jun 10, 2024

Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"

Python 53 2 Updated May 22, 2024

SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.

Python 4,037 356 Updated Jul 30, 2024

Accessible large language models via k-bit quantization for PyTorch.

Python 5,831 593 Updated Jul 31, 2024

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python 2,105 251 Updated Jul 31, 2024

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

Python 135 19 Updated Jul 31, 2024

This repository contains integer operators on GPUs for PyTorch.

Python 161 48 Updated Sep 29, 2023

dabnn is an accelerated binary neural networks inference framework for mobile platform

C++ 765 100 Updated Nov 12, 2019
Python 88 4 Updated Jun 12, 2024

Repository for Correlation Aware Prune (NeurIPS23) source and experimental code

Python 4 1 Updated Nov 29, 2023

This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.

Python 67 7 Updated Apr 12, 2024

[ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

Python 15 Updated Jun 4, 2024

Simplify your onnx model

C++ 3,710 378 Updated Jul 8, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,106 143 Updated Jul 31, 2024

nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。

C++ 554 86 Updated Jul 30, 2024

CVPR2024 - Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

Python 103 6 Updated Jun 29, 2024

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Cuda 1,161 135 Updated Jul 31, 2024

[CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".

Jupyter Notebook 48 3 Updated Jul 25, 2024

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 162 13 Updated May 28, 2024

ICLR2024: LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection.

58 Updated Jul 9, 2024

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 240 27 Updated Jul 30, 2024
Next