- Hong Kong
-
20:42
(UTC +08:00)
Highlights
- Pro
Block or Report
Block or report enhuiz
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
VAE modified from Descript Audio Codec, which replaces the RVQ with VAE
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly …
A family of diffusion models for text-to-audio generation.
PyTorch Implementation of [AudioLCM]: a efficient and high-quality text-to-audio generation with latent consistency model.
A library for efficient similarity search and clustering of dense vectors.
Multi-level network clustering based on the Map Equation
SimPO: Simple Preference Optimization with a Reference-Free Reward
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
A generative speech model for daily dialogue.
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
PyTorch implementation of normalizing flow models
PyTorch Implementation of DSB for Score Based Generative Modeling. Experiments managed using Hydra.
AI powered speech denoising and enhancement
BlinkDL / nanoRWKV
Forked from karpathy/nanoGPTRWKV in nanoGPT style
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
Awesome-LLM: a curated list of Large Language Model
Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E