-
Human Language Technology Lab, NUS
- Singapore
- https://kunzhou9646.github.io/
- @KunZhou65685140
Stars
An Open-Sourced LLM-empowered Foundation TTS System
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
[Official Implementation] Acoustic Autoregressive Modeling 🔥
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
Open-Sora: Democratizing Efficient Video Production for All
*BeaqleJS* provides a framework to create browser based listening tests and is purely based on open web standards like HTML5 and Javascript.
A generative speech model for daily dialogue.
Speech, Language, Audio, Music Processing with Large Language Model
shivammehta25 / Lumina-T2X
Forked from Alpha-VLLM/Lumina-T2XLumina-T2X is a unified framework for Text to Any Modality Generation
Word alignments generated by the Montreal Forced Aligner for the Librispeech dataset
Lumina-T2X is a unified framework for Text to Any Modality Generation
Official repo for WavCraft, an AI agent for audio creation and editing
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
An Open Source text-to-speech system built by inverting Whisper.
A unified dataset of multilingual emotional human utterances
Unofficial implementation of NVIDIA P-Flow TTS paper
A family of diffusion models for text-to-audio generation.
🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Inference and training library for high-quality TTS models.