Skip to content
View KunZhou9646's full-sized avatar
🙃
I am here!
🙃
I am here!

Block or report KunZhou9646

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Next-Token Prediction is All You Need

Python 805 23 Updated Sep 30, 2024

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 2,929 339 Updated Aug 19, 2024

An Open-Sourced LLM-empowered Foundation TTS System

Python 191 8 Updated Sep 25, 2024

[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"

Python 301 21 Updated Sep 3, 2024

[Official Implementation] Acoustic Autoregressive Modeling 🔥

Python 54 5 Updated Aug 24, 2024

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch

Python 250 23 Updated Sep 11, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 5,237 538 Updated Sep 29, 2024

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 131 8 Updated Aug 25, 2024

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

111 2 Updated Jun 13, 2024

Pytorch implementation of SoundCTM

Python 68 6 Updated Oct 1, 2024

PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.

Python 1,117 177 Updated Jul 17, 2024

a text-conditional diffusion probabilistic model capable of generating high fidelity audio.

Python 119 14 Updated May 29, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 21,763 2,105 Updated Aug 9, 2024

*BeaqleJS* provides a framework to create browser based listening tests and is purely based on open web standards like HTML5 and Javascript.

JavaScript 86 49 Updated Mar 9, 2019

A generative speech model for daily dialogue.

Python 31,186 3,387 Updated Sep 21, 2024

Speech, Language, Audio, Music Processing with Large Language Model

Python 513 43 Updated Oct 2, 2024

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 1 Updated May 17, 2024

Word alignments generated by the Montreal Forced Aligner for the Librispeech dataset

Python 149 23 Updated Mar 25, 2019

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,038 86 Updated Aug 6, 2024

Official repo for WavCraft, an AI agent for audio creation and editing

Python 649 96 Updated Sep 13, 2024

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

Python 740 108 Updated May 22, 2024

This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.

Python 197 11 Updated Jul 25, 2024

An Open Source text-to-speech system built by inverting Whisper.

Jupyter Notebook 3,818 207 Updated Jun 18, 2024

A unified dataset of multilingual emotional human utterances

Jupyter Notebook 22 2 Updated Jan 4, 2022

Unofficial implementation of NVIDIA P-Flow TTS paper

Python 214 30 Updated Jul 1, 2024

A family of diffusion models for text-to-audio generation.

Python 993 79 Updated Jul 3, 2024

🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps

Python 137 16 Updated Apr 23, 2024
Python 282 37 Updated Sep 3, 2024

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,146 106 Updated Jul 11, 2024

Inference and training library for high-quality TTS models.

Python 4,300 432 Updated Sep 23, 2024
Next