Skip to content
View lixucuhk's full-sized avatar
🐬
Growing up
🐬
Growing up

Highlights

  • Pro

Block or report lixucuhk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 10,804 1,054 Updated Aug 15, 2024

FMA: A Dataset For Music Analysis

Jupyter Notebook 2,212 432 Updated Jan 5, 2023

Generative models for conditional audio generation

Python 2,557 240 Updated Jul 15, 2024

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 11,974 816 Updated Oct 3, 2024

Implementation of MusicLM, a text to music model published by Google Research, with a few modifications.

Python 514 58 Updated Jun 3, 2023

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,144 106 Updated Jul 11, 2024

Bark Voice Cloning and Voice Cloning for Chinese Speech

Jupyter Notebook 2,741 396 Updated Aug 8, 2024

Windows 云音乐歌词获取【网易云、QQ音乐】

C# 2,023 107 Updated Aug 25, 2024

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,417 233 Updated Oct 3, 2024

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 35,526 4,175 Updated Aug 19, 2024

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

7,437 901 Updated Aug 21, 2024

使用unity实现AI聊天相关功能。目前这个库包含了对chatgpt、chatglm等大语言模型的api调用的代码实现以及实现了微软Azure以及百度AI的语音服务功能,语音服务均采用web api实现,支持Windows/WebGL/Android等平台

443 63 Updated Sep 29, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 33,467 3,840 Updated Oct 2, 2024

PhotoMaker [CVPR 2024]

Jupyter Notebook 9,396 750 Updated Aug 15, 2024

Awesome-LLM: a curated list of Large Language Model

18,005 1,453 Updated Oct 2, 2024

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 37,716 3,966 Updated Jul 28, 2024

Instant voice cloning by MIT and MyShell.

Python 28,966 2,824 Updated Aug 21, 2024

Let us control diffusion models!

Python 29,938 2,703 Updated Feb 25, 2024
Python 252 37 Updated May 22, 2024

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

558 31 Updated Aug 3, 2024

This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

Python 101 11 Updated Apr 23, 2024

A family of diffusion models for text-to-audio generation.

Python 991 79 Updated Jul 3, 2024

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Python 3,226 320 Updated Sep 29, 2024

Text-to-Audio/Music Generation

Python 2,250 177 Updated Sep 29, 2024

ImageBind One Embedding Space to Bind Them All

Python 8,250 758 Updated Jul 31, 2024

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,332 85 Updated Sep 23, 2024
Jupyter Notebook 43 8 Updated Aug 16, 2023

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Python 4,486 333 Updated Jul 10, 2024

Official Implementation of "Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models"

Python 368 26 Updated Jul 4, 2023

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,582 756 Updated Feb 11, 2024
Next