Skip to content
View yw0nam's full-sized avatar
  • IBricks
  • Seongnam, South Korea
  • LinkedIn in/yw0nam

Block or report yw0nam

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Aligning LMMs with Factually Augmented RLHF

Python 309 20 Updated Nov 1, 2023

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,724 252 Updated Sep 25, 2024

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

Python 1,075 93 Updated Jun 13, 2024
Python 2,527 188 Updated Sep 26, 2024

ImageBind One Embedding Space to Bind Them All

Python 8,249 758 Updated Jul 31, 2024

[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

Python 36 1 Updated Sep 4, 2024

This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) avai…

Jupyter Notebook 2,309 232 Updated Sep 26, 2024

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Python 424 194 Updated Jul 4, 2024

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Python 1,531 123 Updated Jun 17, 2024

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,122 66 Updated Aug 13, 2024

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 11,699 1,239 Updated Aug 21, 2024

Python packaging and dependency management made easy

Python 31,231 2,256 Updated Oct 2, 2024

Powerful Free DeepL API, No Token Required

Go 6,309 510 Updated Sep 19, 2024

Talk to any LLM with hands-free voice interaction, voice interruption, Live2D taking face, and long-term memory running locally across platforms

Python 516 62 Updated Sep 22, 2024

Open-Waifu open-sourced finetunable customizable simpable AI waifu inspired by neuro-sama

Python 403 31 Updated Aug 13, 2024

Extracting character conversations in Genshin Project

Python 52 10 Updated Sep 2, 2024

A project that extracts Honkai: Star Rail text corpus

Python 19 1 Updated Jul 12, 2024

Genshin Datasets For SVC/SVS/TTS

587 37 Updated Sep 7, 2024

Honkai Impact 3 High Quality Dataset

1 Updated Jun 10, 2024

Brand new TTS solution

Python 12,782 958 Updated Sep 30, 2024

Every front-end GUI client for ChatGPT

2,418 174 Updated Sep 29, 2024

한국어 언어모델 다분야 사고력 벤치마크

Python 155 28 Updated Sep 15, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 33,447 3,839 Updated Oct 2, 2024

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)

Python 31,815 3,905 Updated Oct 1, 2024

Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles.

Python 707 83 Updated Sep 9, 2024

Korean corpus repository

Python 691 80 Updated Oct 3, 2022

Preprocess Audio for training

Python 232 45 Updated Sep 17, 2024
Jupyter Notebook 52 11 Updated Mar 21, 2024

Inference and training library for high-quality TTS models.

Python 4,289 430 Updated Sep 23, 2024

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 7,516 739 Updated Jun 24, 2024
Next