javey-q

Jiawei Qiu javey-q

3 followers · 36 following

Lists (17)

Sort

Beta Lists are currently in beta. Share feedback and report bugs.

Stars

ModelTC / llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Python 240 27 Updated Sep 27, 2024

mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,201 138 Updated Jul 12, 2024

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 4,342 390 Updated Sep 28, 2024

ChunelFeng / CGraph

【A common used C++ DAG framework】一个通用的、无三方依赖的、跨平台的、收录于awesome-cpp的、基于流图的并行计算框架。欢迎star & fork & 交流

C++ 1,740 318 Updated Oct 3, 2024

alibaba / MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

C++ 8,626 1,659 Updated Sep 27, 2024

leejet / stable-diffusion.cpp

Stable Diffusion and Flux in pure C/C++

C++ 3,329 279 Updated Sep 2, 2024

ZTMIDGO / Android-Stable-diffusion-ONNX

使用Android手机的CPU推理stable diffusion

Java 138 27 Updated Dec 2, 2023

feifeibear / long-context-attention

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 322 20 Updated Sep 19, 2024

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 10,614 2,110 Updated Oct 2, 2024

zjhellofss / KuiperLLama

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama的大模型推理框架。

C++ 189 37 Updated Oct 4, 2024

PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Python 1,625 77 Updated Aug 5, 2024

black-forest-labs / flux

Official inference repo for FLUX.1 models

Python 14,439 1,039 Updated Oct 3, 2024

facebookresearch / DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 6,078 539 Updated May 31, 2024

microsoft / LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 10,434 669 Updated Aug 14, 2024

IrisRainbowNeko / HCP-Diffusion

A universal Stable-Diffusion toolbox

Python 899 75 Updated Sep 17, 2024

feifeibear / LLMRoofline

Compare different hardware platforms via the Roofline Model for LLM inference tasks.

Jupyter Notebook 73 3 Updated Mar 13, 2024

DefTruth / CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,229 133 Updated Oct 5, 2024

tracel-ai / burn

Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.

Rust 8,539 422 Updated Oct 4, 2024