In order to catch up with the pace of LLM and VLM, this project is dedicated to the research of large speech models. Including the following tasks: We build a latest large model fine-tuning, inference pipeline, and benchmarks for testing downstream tasks. Simple tasks include speech classification, speech recognition, and voiceprint recognition. Complex tasks include speech production, semantic understanding, voice continuation,multimodal tasks, and reasoning speed. Why emphasize reasoning speed? We believe that real-time speech understanding is the goal of speech development. GPT4o shows us GPT that can listen and listen at the same time. In order to promote the development of speech research, we hope that interested friends will join us to study large speech models and make progress together!
-
Notifications
You must be signed in to change notification settings - Fork 0
wntg/ALM
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
About
Audio large model study
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published