Stars
Tools for merging pretrained large language models.
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
MINT-1T: A one trillion token multimodal interleaved dataset.
Ongoing research training transformer models at scale
Stable Diffusion web UI
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
a state-of-the-art-level open visual language model | 多模态预训练模型
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
A series of large language models developed by Baichuan Intelligent Technology
DataComp: In search of the next generation of multimodal datasets
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
A 13B large language model developed by Baichuan Intelligent Technology
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
✨✨Latest Advances on Multimodal Large Language Models
Research Trends in LLM-guided Multimodal Learning.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Painter & SegGPT Series: Vision Foundation Models from BAAI
Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch
LAVIS - A One-stop Library for Language-Vision Intelligence
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.