qiuqiangkong

qiuqiangkong

578 followers · 22 following

Achievements

Highlights

Stars

bytedance / paws_room_acoustics_simulator

Python 2 1 Updated Sep 18, 2024

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,132 126 Updated Sep 24, 2024

kyutai-labs / moshi

Python 5,971 446 Updated Oct 4, 2024

JusperLee / Apollo

Music repair method to convert lossy MP3 compressed music to lossless music.

Python 96 8 Updated Sep 23, 2024

HarlandZZC / music_tagging_accelerate

Training music tagging model with accelerate framework on multi-node multi-gpu

Python 7 Updated Sep 25, 2024

feizc / FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

Python 1,531 116 Updated Sep 6, 2024

lucidrains / transfusion-pytorch

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 608 22 Updated Oct 1, 2024

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,742 253 Updated Sep 25, 2024

LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 827 41 Updated Sep 27, 2024

ZFTurbo / Music-Source-Separation-Training

Repository for training models for music source separation.

Python 405 54 Updated Sep 27, 2024

meta-llama / llama-models

Utilities intended for use with Llama models.

Python 4,298 764 Updated Oct 3, 2024

XinhaoMei / WavCaps

This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.

Python 197 11 Updated Jul 25, 2024

haoheliu / SemantiCodec-inference

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 130 8 Updated Aug 25, 2024

zszheng147 / Spatial-AST

🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)

Python 29 2 Updated Sep 21, 2024

buoyancy99 / diffusion-forcing

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 520 20 Updated Sep 26, 2024

modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 6,182 659 Updated Sep 30, 2024

mini-sora / minisora

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Python 1,179 149 Updated Sep 25, 2024

soundata / soundata

Python library for downloading, loading & working with sound datasets

Python 319 22 Updated Jun 30, 2024

gemelo-ai / vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 771 88 Updated Aug 7, 2024

Camb-ai / MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI

Jupyter Notebook 2,476 201 Updated Aug 1, 2024

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Python 20,698 2,108 Updated Jul 18, 2024

AMAAI-Lab / MidiCaps

A large-scale dataset of caption-annotated MIDI files.

Python 45 1 Updated Jul 23, 2024

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 31,178 3,392 Updated Sep 21, 2024

tts-tutorial / interspeech2022

160 5 Updated Sep 19, 2022

neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality

Jupyter Notebook 12,997 1,793 Updated Aug 19, 2024

PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 11,291 1,008 Updated Oct 4, 2024

jasonlaska / spherecluster

Clustering routines for the unit sphere

Python 332 78 Updated Mar 20, 2024

X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Python 513 43 Updated Oct 2, 2024

mct10 / RepCodec

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python 147 10 Updated Jul 12, 2024

Alpha-VLLM / Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,038 86 Updated Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qiuqiangkong

Achievements

Achievements

Highlights

Block or report qiuqiangkong

Stars

bytedance / paws_room_acoustics_simulator

ictnlp / LLaMA-Omni

kyutai-labs / moshi

JusperLee / Apollo

HarlandZZC / music_tagging_accelerate

feizc / FluxMusic

lucidrains / transfusion-pytorch

gpt-omni / mini-omni

LTH14 / mar

ZFTurbo / Music-Source-Separation-Training

meta-llama / llama-models

XinhaoMei / WavCaps

haoheliu / SemantiCodec-inference

zszheng147 / Spatial-AST

buoyancy99 / diffusion-forcing

modelscope / FunASR

mini-sora / minisora

soundata / soundata

gemelo-ai / vocos

Camb-ai / MARS5-TTS

facebookresearch / audiocraft

AMAAI-Lab / MidiCaps

2noise / ChatTTS

tts-tutorial / interspeech2022

neonbjb / tortoise-tts

PKU-YuanGroup / Open-Sora-Plan

jasonlaska / spherecluster

X-LANCE / SLAM-LLM

mct10 / RepCodec

Alpha-VLLM / Lumina-T2X