This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 433 39 Updated Jun 9, 2024

apapiu / transformer_latent_diffusion

Text to Image Latent Diffusion using a Transformer core

Python 128 14 Updated Aug 29, 2024

srush / GPU-Puzzles

Solve puzzles. Learn CUDA.

Jupyter Notebook 9,164 554 Updated Sep 1, 2024

FoundationVision / VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly …

Python 4,032 302 Updated Jul 16, 2024

google-research / scenic

Scenic: A Jax Library for Computer Vision Research and Beyond

Python 3,265 429 Updated Sep 30, 2024

GallagherCommaJack / flux-jax

Python 4 1 Updated Aug 31, 2024

tloen / alpaca-lora

Instruct-tune LLaMA on consumer hardware

Jupyter Notebook 18,565 2,215 Updated Jul 29, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 65,662 9,421 Updated Sep 30, 2024

nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 69,658 7,623 Updated Sep 30, 2024

tincans-ai / gazelle

Joint speech-language model - respond directly to audio!

Python 342 31 Updated Jul 1, 2024

haoheliu / AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Python 2,401 221 Updated Jun 2, 2024

mdeff / fma

FMA: A Dataset For Music Analysis

Jupyter Notebook 2,210 432 Updated Jan 5, 2023

archinetai / a-unet

A toolbox that provides hackable building blocks for generic 1D/2D/3D UNets, in PyTorch.

Python 81 8 Updated Jun 12, 2023

crlandsc / tiny-audio-diffusion

A repository for generating and training short audio samples with unconditional waveform diffusion on accessible consumer hardware (<2GB VRAM GPU)

Python 152 15 Updated Jun 6, 2024

maum-ai / univnet

Unofficial PyTorch Implementation of UnivNet Vocoder (https://arxiv.org/abs/2106.07889)

Python 264 46 Updated Oct 8, 2021

bytedance / 1d-tokenizer

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Jupyter Notebook 402 16 Updated Sep 25, 2024

facebookresearch / encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,445 306 Updated Jan 4, 2024

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,119 66 Updated Aug 13, 2024

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,079 120 Updated Sep 24, 2024