-
Shanghai Jiao Tong University
- ziyang.tech
Highlights
- Pro
Stars
This is the official implement of A Controllable Emotion Voice Conversion Framework with Pre-trained Speech Representations
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)
Multilingual Voice Understanding Model
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Speech, Language, Audio, Music Processing with Large Language Model
A generative speech model for daily dialogue.
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
lina-speech : linear attention based text-to-speech
🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
Modeling, training, eval, and inference code for OLMo
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
An Open Source text-to-speech system built by inverting Whisper.
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
SALMONN: Speech Audio Language Music Open Neural Network
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP