Block or Report
Block or report Wendison
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…
BLSP-Emo: Towards Empathetic Large Speech-Language Models
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
✨✨Latest Advances on Multimodal Large Language Models
A generative speech model for daily dialogue.
Training code for FAcodec presented in NaturalSpeech3
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Inference and training library for high-quality TTS models.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
Modeling, training, eval, and inference code for OLMo
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
[NeurIPS 2023] UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models
Foundational Models for State-of-the-Art Speech and Text Translation
[AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS
Vector (and Scalar) Quantization, in Pytorch
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
An Open Source text-to-speech system built by inverting Whisper.
リアルタイムボイスチェンジャー Realtime Voice Changer
A family of diffusion models for text-to-audio generation.