Skip to content
View ddlBoJack's full-sized avatar

Highlights

  • Pro

Block or report ddlBoJack

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 5,986 446 Updated Oct 4, 2024

This is the official implement of A Controllable Emotion Voice Conversion Framework with Pre-trained Speech Representations

Python 20 2 Updated Sep 23, 2024

LlamaVoice is a llama-based large voice generation model, providing inference and training ability.

Python 214 11 Updated Aug 26, 2024

Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)

Python 44 11 Updated Oct 1, 2024

Multilingual Voice Understanding Model

Python 2,823 267 Updated Sep 25, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 5,237 538 Updated Sep 29, 2024
Python 15 3 Updated May 2, 2024

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement

Python 108 5 Updated Sep 27, 2024

[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Python 126 5 Updated Jun 17, 2024

Speech, Language, Audio, Music Processing with Large Language Model

Python 513 43 Updated Oct 2, 2024

A generative speech model for daily dialogue.

Python 31,186 3,387 Updated Sep 21, 2024

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 232 19 Updated Oct 5, 2024

lina-speech : linear attention based text-to-speech

Jupyter Notebook 116 9 Updated Jun 3, 2024

🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)

Python 29 2 Updated Sep 21, 2024

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

558 31 Updated Aug 3, 2024

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…

Python 54,476 5,623 Updated Aug 24, 2024

Modeling, training, eval, and inference code for OLMo

Python 4,482 449 Updated Oct 4, 2024

[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

Python 102 3 Updated Apr 19, 2024

An Open Source text-to-speech system built by inverting Whisper.

Jupyter Notebook 3,818 207 Updated Jun 18, 2024

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 589 42 Updated Sep 9, 2024
Python 131 11 Updated Jul 9, 2024

GPT-4V with Emotion

Python 83 6 Updated Dec 8, 2023

SALMONN: Speech Audio Language Music Open Neural Network

Python 996 78 Updated Sep 24, 2024

[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"

Python 301 21 Updated Sep 3, 2024

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

Python 318 25 Updated Feb 21, 2024

The Open Source Code of UniAudio

Python 509 31 Updated Jul 22, 2024

A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

848 72 Updated Sep 22, 2024

FAIR Sequence Modeling Toolkit 2

Python 682 78 Updated Oct 4, 2024
Next