Skip to content
View kobenaxie's full-sized avatar

Block or report kobenaxie

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 5,986 446 Updated Oct 4, 2024

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 89 3 Updated Oct 1, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,745 254 Updated Sep 25, 2024

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,124 67 Updated Aug 13, 2024
Python 8 Updated Sep 25, 2024

Tools for handling speech data in machine learning projects.

Python 936 214 Updated Oct 4, 2024

LLM101n: Let's build a Storyteller

29,147 1,598 Updated Aug 1, 2024

Open source real-time translation app for Android that runs locally

C++ 6,593 498 Updated Sep 27, 2024

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 372 21 Updated Sep 11, 2024

Speech, Language, Audio, Music Processing with Large Language Model

Python 513 43 Updated Oct 2, 2024

Reference implementation of Megalodon 7B model

Cuda 502 52 Updated Apr 18, 2024

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Python 347 39 Updated Sep 13, 2024

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 7,520 740 Updated Jun 24, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 21,763 2,105 Updated Aug 9, 2024

FAIR Sequence Modeling Toolkit 2

Python 682 78 Updated Oct 4, 2024

Code for paper "The effect of batch size on contrastive self-supervised speech representation learning"

Python 8 1 Updated Aug 29, 2024

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Python 600 28 Updated Mar 12, 2024

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 6,078 539 Updated May 31, 2024

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 9,079 838 Updated Jul 1, 2024

Efficient framework-agnostic data loading

C++ 359 38 Updated Sep 7, 2024

🦄 Unitxt: a python library for getting data fired up and set for training and evaluation

Python 153 40 Updated Oct 4, 2024

This is the official code release for Bayesian Flow Networks.

Python 244 27 Updated Jul 18, 2024

Brand new TTS solution

Python 12,848 961 Updated Oct 3, 2024

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

Go 10,647 930 Updated Sep 30, 2024

AI powered speech denoising and enhancement

Python 1,324 135 Updated Jun 21, 2024

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,420 105 Updated Jul 5, 2024

vits2 backbone with multilingual-bert

Python 7,857 1,112 Updated Oct 1, 2024

A Data Streaming Library for Efficient Neural Network Training

Python 1,087 137 Updated Oct 2, 2024

Wunjo CE: Face Swap, Lip Sync, Control Remove Objects & Text & Background, Restyling, Audio Separator, Clone Voice, Video Generation. Open Source, Local & Free.

Python 828 95 Updated Sep 19, 2024

SALMONN: Speech Audio Language Music Open Neural Network

Python 996 78 Updated Sep 24, 2024
Next