Skip to content
View A-suozhang's full-sized avatar
🐟
Touch Fish EveryDay
🐟
Touch Fish EveryDay

Highlights

  • Pro

Organizations

@thu-nics

Block or report A-suozhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

623 results for source starred repositories
Clear filter

End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).

Python 213 7 Updated Sep 25, 2024

Fast Hadamard transform in CUDA, with a PyTorch interface

C 94 14 Updated May 24, 2024

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

Python 599 16 Updated Sep 18, 2024

Code for QuaRot, an end-to-end 4-bit inference of large language models.

Python 259 20 Updated Jul 22, 2024

Evaluating dynamics capability of T2V generation models with DEVIL protocols.

Python 324 43 Updated Sep 30, 2024

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 608 22 Updated Oct 1, 2024

The homepage of OneBit model quantization framework.

Python 143 3 Updated Jun 27, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,215 114 Updated Oct 3, 2024

System 2 Reasoning Link Collection

635 53 Updated Oct 4, 2024

Efficient Triton Kernels for LLM Training

Python 3,119 159 Updated Oct 4, 2024

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters

Python 552 48 Updated Sep 28, 2024

TerDiT: Ternary Diffusion Models with Transformers

Python 58 2 Updated Jun 17, 2024

Triton implementation of FlashAttention2 that adds Custom Masks.

Python 64 5 Updated Aug 14, 2024

Official inference repo for FLUX.1 models

Python 14,421 1,037 Updated Oct 3, 2024

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Python 240 27 Updated Sep 27, 2024

torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.

C++ 17 Updated Mar 29, 2024

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,176 486 Updated Oct 4, 2024

Development repository for the Triton language and compiler

C++ 12,921 1,568 Updated Oct 4, 2024

CUDA Matrix Multiplication Optimization

Cuda 125 9 Updated Jul 19, 2024

row-major matmul optimization

C++ 586 78 Updated Sep 9, 2023

Implementation of rectified flow and some of its followup research / improvements in Pytorch

Python 161 2 Updated Aug 21, 2024

C++ extensions in PyTorch

Python 995 209 Updated Aug 7, 2024

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,226 133 Updated Oct 4, 2024

Scaling Diffusion Transformers with Mixture of Experts

Python 187 8 Updated Sep 9, 2024

[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Python 189 6 Updated Aug 21, 2024

Evaluating text-to-image/video/3D models with VQAScore

Python 186 17 Updated Sep 9, 2024

The calflops is designed to calculate FLOPs、MACs and Parameters in all various neural networks, such as Linear、 CNN、 RNN、 GCN、Transformer(Bert、LlaMA etc Large Language Model)

Python 508 16 Updated Jun 27, 2024

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 148 24 Updated Oct 4, 2024
Next