-
University of Chinese Academy of Sciences
Highlights
- Pro
Stars
Raycast extention for Ollama
Dataset/code for AudioMarkBench: Benchmarking Robustness of Audio Watermarking
📺 Discover the latest machine learning / AI courses on YouTube.
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
Provably Secure Steganography in Practice Based on “Distribution Copies”
This repository contains the implementation for the paper "AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA", accepted by ICML 2024.
Official PyTorch implementation of BigVGAN (ICLR 2023)
Official repository for the paper "Topological Neural Discrete Representation Learning à la Kohonen" (ICML 2023 Workshop on Sampling and Optimization in Discrete Space)
Repository for the code associated with the paper "On the Identifiability of Quantized Factors" by Vitória Barin-Pacela, Kartik Ahuja, Simon Lacoste-Julien, Pascal Vincent, Conference on Causal Lea…
According to the paper "Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data"
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
Understand Human Behavior to Align True Needs
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Official code for "Style Aligned Image Generation via Shared Attention"
This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
This repository contains the SpeechBrain Benchmarks
[NeurIPS 2023] Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
A generative speech model for daily dialogue.
Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Nightly release of ControlNet 1.1