Skip to content
View Yutong-Zhou-cv's full-sized avatar
🍀
Enjoy
🍀
Enjoy

Highlights

  • Pro

Block or report Yutong-Zhou-cv

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

2,100 187 Updated Aug 20, 2024
Jupyter Notebook 65 4 Updated Jul 15, 2024

Official code for "DiffX: Guide Your Layout to Cross-Modal Generative Modeling"

Python 13 2 Updated Sep 21, 2024

Detail-Oriented CLIP for Fine-Grained Tasks

Python 30 Updated Sep 27, 2024

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

Python 2,251 171 Updated Sep 23, 2024

Integrated Image-based Deep Learning and Language Models for Primary Diabetes Care

Python 38 7 Updated Jun 7, 2024
Jupyter Notebook 93 13 Updated Sep 17, 2024
Python 6 Updated Sep 6, 2024

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Python 153 9 Updated Sep 29, 2024

LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.

Python 1,448 87 Updated Nov 7, 2023

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 4,750 389 Updated Sep 29, 2024

Bring portraits to life!

Python 12,055 1,267 Updated Sep 6, 2024

ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation

Python 408 39 Updated Aug 30, 2024

An expert benchmark aiming to comprehensively evaluate the aesthetic perception capacities of MLLMs.

Python 205 7 Updated Aug 15, 2024

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Python 2,507 306 Updated Sep 23, 2024

Official Implementation of LADS (Latent Augmentation using Domain descriptionS)

Python 49 7 Updated Apr 18, 2023

Augmenting with Language-guided Image Augmentation (ALIA)

Python 62 9 Updated Oct 30, 2023

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,413 133 Updated Sep 24, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 4,880 373 Updated Aug 7, 2024

Automated Design of Agentic Systems

Python 893 128 Updated Aug 20, 2024

Official Implementation of 'Inserting Anybody in Diffusion Models via Celeb Basis'

Jupyter Notebook 253 7 Updated Oct 11, 2023

Contextual Object Detection with Multimodal Large Language Models

183 5 Updated May 30, 2023

[ECCV 2024 Workshop🎈] The first agriculture benchmark to evaluate MM-LLMs.

5 Updated Aug 27, 2024

✨✨Latest Advances on Multimodal Large Language Models

11,982 769 Updated Sep 25, 2024

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Python 1,396 115 Updated Oct 1, 2024

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

Python 124 4 Updated Jul 1, 2024

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

Jupyter Notebook 7,669 1,040 Updated Sep 10, 2024

A list for Text-to-Video, Image-to-Video works

173 8 Updated Aug 19, 2024

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey [Miyai+, arXiv2024]

55 2 Updated Aug 1, 2024
Next