Stars
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Tools for handling speech data in machine learning projects.
Open source real-time translation app for Android that runs locally
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Speech, Language, Audio, Music Processing with Large Language Model
Reference implementation of Megalodon 7B model
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Open-Sora: Democratizing Efficient Video Production for All
Code for paper "The effect of batch size on contrastive self-supervised speech representation learning"
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
This is the official code release for Bayesian Flow Networks.
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
AI powered speech denoising and enhancement
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
A Data Streaming Library for Efficient Neural Network Training
Wunjo CE: Face Swap, Lip Sync, Control Remove Objects & Text & Background, Restyling, Audio Separator, Clone Voice, Video Generation. Open Source, Local & Free.
SALMONN: Speech Audio Language Music Open Neural Network