LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,141 128 Updated Sep 24, 2024

sangyun884 / fast-ode

Official PyTorch implementation for the paper Minimizing Trajectory Curvature of ODE-based Generative Models, ICML 2023

Python 76 6 Updated May 22, 2024

sangyun884 / rfpp

The codebase of our paper "Improving the Training of Rectified Flows"

Python 67 3 Updated Jul 11, 2024

SKholkin / LightSB-Matching

Light and Optimal Schrödinger Bridge Matching (ICML 2024) official PyTorch implementation=

Python 33 4 Updated Aug 8, 2024

yuyang-shi / dsbm-pytorch

PyTorch Implementation of Diffusion Schrodinger Bridge Matching

Python 115 5 Updated May 28, 2023

feizc / FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

Python 1,533 116 Updated Sep 6, 2024

ai-forever / Real-ESRGAN

PyTorch implementation of Real-ESRGAN model

Python 478 122 Updated Apr 15, 2024

boomb0om / watermark-detection

Model for watermark classification implemented with PyTorch

Jupyter Notebook 86 22 Updated Sep 19, 2024

georgosgeorgos / trajectory-alignment-diffusion

Code for "Aligning Optimization Trajectories with Diffusion Models for Constrained Design Generation" @ NeurIPS 2023

Python 8 5 Updated Oct 12, 2023

sail-sg / MDT

Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)

Python 513 38 Updated Apr 23, 2024

Jimmy-7664 / STD-MAE

[IJCAI-24] Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting

Python 98 7 Updated Sep 30, 2024

omerbt / TokenFlow

Official Pytorch Implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" presenting "TokenFlow" (ICLR 2024)

Python 1,558 135 Updated Jan 23, 2024

feizc / DiT-MoE

Scaling Diffusion Transformers with Mixture of Experts

Python 187 8 Updated Sep 9, 2024

CodeGoat24 / Face-diffuser

[CVPR2024] Official implementation of High-fidelity Person-centric Subject-to-Image Synthesis.

Python 37 1 Updated Aug 23, 2024

lucidrains / transfusion-pytorch

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 610 22 Updated Oct 1, 2024

yhli123 / Immiscible-Diffusion

Official Github Repo for Neurips 2024 Paper Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment

Python 28 1 Updated Oct 3, 2024

fundwotsai2001 / AP-adapter

Audio Prompt Adapter: Unleashing music editing abilities for text-to-music with lightweight finetuning [ISMIR 2024]

Python 27 Updated Sep 21, 2024

glory20h / VoiceLDM

VoiceLDM: Text-to-Speech with Environmental Context

Python 159 8 Updated Aug 9, 2024

CompVis / fm-boosting

FMBoost: Boosting Latent Diffusion with Flow Matching (ECCV 2024 Oral)

Python 152 1 Updated Oct 2, 2024

pix2pixzero / pix2pix-zero

Zero-shot Image-to-Image Translation [SIGGRAPH 2023]

Python 1,057 79 Updated Sep 5, 2024

Lev Novitskiy leffff

Highlights

Lists (11)

📐 Benchmark

🗄 Datasets

👁️ My CV Stack

🌐 My Graph Neural Network Stack

📰 My NLP Stack

☁️ My Point Cloud Stack

🧪 My research

🚀 My stack

📊My Table Model Stack

📑 Papers

🔧 Utils

Starred repositories

Machine learning