Skip to content
View qiuqiangkong's full-sized avatar

Highlights

  • Pro

Block or report qiuqiangkong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userā€™s behavior. Learn more about reporting abuse.

Report abuse
468 results for source starred repositories
Clear filter

[ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"

Python 166 2 Updated Aug 14, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,132 126 Updated Sep 24, 2024
Python 5,976 446 Updated Oct 4, 2024

Music repair method to convert lossy MP3 compressed music to lossless music.

Python 96 8 Updated Sep 23, 2024

Training music tagging model with accelerate framework on multi-node multi-gpu

Python 7 Updated Sep 25, 2024

Text-to-Music Generation with Rectified Flow Transformers

Python 1,531 116 Updated Sep 6, 2024

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 608 22 Updated Oct 1, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,742 253 Updated Sep 25, 2024

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 827 41 Updated Sep 27, 2024

Repository for training models for music source separation.

Python 405 54 Updated Sep 27, 2024

Utilities intended for use with Llama models.

Python 4,299 764 Updated Oct 3, 2024

This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.

Python 197 11 Updated Jul 25, 2024

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 130 8 Updated Aug 25, 2024

šŸ¦‡ Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)

Python 29 2 Updated Sep 21, 2024

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 520 20 Updated Sep 26, 2024

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 6,182 659 Updated Sep 30, 2024

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Python 1,179 149 Updated Sep 25, 2024

Python library for downloading, loading & working with sound datasets

Python 319 22 Updated Jun 30, 2024

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 771 88 Updated Aug 7, 2024

MARS5 speech model (TTS) from CAMB.AI

Jupyter Notebook 2,477 201 Updated Aug 1, 2024

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllableā€¦

Python 20,698 2,108 Updated Jul 18, 2024

A large-scale dataset of caption-annotated MIDI files.

Python 45 1 Updated Jul 23, 2024

A generative speech model for daily dialogue.

Python 31,179 3,386 Updated Sep 21, 2024

A multi-voice TTS system trained with an emphasis on quality

Jupyter Notebook 12,998 1,793 Updated Aug 19, 2024

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 11,291 1,008 Updated Oct 4, 2024

Clustering routines for the unit sphere

Python 332 78 Updated Mar 20, 2024

Speech, Language, Audio, Music Processing with Large Language Model

Python 513 43 Updated Oct 2, 2024

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python 147 10 Updated Jul 12, 2024
Next