Skip to content
View Wendison's full-sized avatar
🎯
Focusing
🎯
Focusing
Block or Report

Block or report Wendison

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Brand new TTS solution

Python 5,028 399 Updated Jul 9, 2024

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

380 31 Updated Jul 10, 2024

BLSP-Emo: Towards Empathetic Large Speech-Language Models

Python 26 2 Updated Jun 7, 2024

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

94 1 Updated Jun 13, 2024
Python 640 29 Updated Jul 9, 2024

✨✨Latest Advances on Multimodal Large Language Models

10,549 701 Updated Jul 4, 2024

A generative speech model for daily dialogue.

Python 27,440 2,988 Updated Jul 10, 2024

Training code for FAcodec presented in NaturalSpeech3

Python 117 12 Updated Jul 7, 2024
Python 224 23 Updated Jul 5, 2024

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 7,231 710 Updated Jun 24, 2024

Inference and training library for high-quality TTS models.

Python 2,870 294 Updated Jul 9, 2024
Python 60 2 Updated May 3, 2024

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Python 20,200 2,022 Updated Jun 19, 2024

第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。

Python 493 52 Updated Sep 11, 2023

Modeling, training, eval, and inference code for OLMo

Python 4,203 393 Updated Jul 10, 2024

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 14,984 1,430 Updated Jul 10, 2024

The Open Source Code of UniAudio

Python 479 31 Updated May 3, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 28,960 3,350 Updated Jul 10, 2024

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 34,041 3,553 Updated Jun 11, 2024

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

Python 544 45 Updated Feb 16, 2024

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Python 4,206 215 Updated Jun 14, 2024

[NeurIPS 2023] UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models

Jupyter Notebook 283 12 Updated Sep 22, 2023

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 10,537 1,018 Updated Jun 26, 2024

[AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS

Python 55 8 Updated Feb 23, 2024

Vector (and Scalar) Quantization, in Pytorch

Python 2,164 180 Updated Jul 10, 2024

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 369 32 Updated Jun 9, 2024

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 3,345 242 Updated Jul 9, 2024

An Open Source text-to-speech system built by inverting Whisper.

Jupyter Notebook 3,569 188 Updated Jun 18, 2024

リアルタイムボイスチェンジャー Realtime Voice Changer

Python 15,336 1,647 Updated Jul 10, 2024

A family of diffusion models for text-to-audio generation.

Python 953 75 Updated Jul 3, 2024
Next