Highlights
- Pro
Lists (4)
Sort Name ascending (A-Z)
Stars
Implementation of MagViT2 Tokenizer in Pytorch
Credit card fraud detection through logistic regression, k-means, and deep learning.
WIP Pytorch code for stably training single-step, mode-dropping, deterministic autoencoders
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Text to Image Latent Diffusion using a Transformer core
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly …
Scenic: A Jax Library for Computer Vision Research and Beyond
Instruct-tune LLaMA on consumer hardware
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
Joint speech-language model - respond directly to audio!
AudioLDM: Generate speech, sound effects, music and beyond, with text.
A toolbox that provides hackable building blocks for generic 1D/2D/3D UNets, in PyTorch.
A repository for generating and training short audio samples with unconditional waveform diffusion on accessible consumer hardware (<2GB VRAM GPU)
Unofficial PyTorch Implementation of UnivNet Vocoder (https://arxiv.org/abs/2106.07889)
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Speech, Language, Audio, Music Processing with Large Language Model
Virtual whiteboard for sketching hand-drawn like diagrams
Open weights LLM from Google DeepMind.
A fast and memory-efficient libarary for sparse transformer with varying token numbers (e.g., 3D point cloud).
Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
Collection of AWESOME vision-language models for vision tasks