Skip to content
View iamlockelightning's full-sized avatar

Block or report iamlockelightning

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Next-Token Prediction is All You Need

Python 799 22 Updated Sep 30, 2024

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Python 359 19 Updated Sep 19, 2024

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

Jupyter Notebook 2,251 147 Updated Aug 23, 2024

mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating

Python 76 2 Updated Jan 29, 2024

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Python 6,250 905 Updated Jul 3, 2024

iBOT 🤖: Image BERT Pre-Training with Online Tokenizer (ICLR 2022)

Jupyter Notebook 672 77 Updated Apr 14, 2022

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 608 22 Updated Oct 1, 2024

Official implementation of the Law of Vision Representation in MLLMs

Python 121 7 Updated Sep 8, 2024

⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。

Python 10,568 1,034 Updated Sep 28, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,453 138 Updated Oct 4, 2024

CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus

179 13 Updated Aug 26, 2022

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Python 134 9 Updated Sep 16, 2024

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 882 39 Updated Sep 30, 2024

SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)

Python 74 7 Updated Oct 10, 2023

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,137 850 Updated Sep 13, 2024

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 7,847 731 Updated Oct 3, 2024

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 827 41 Updated Sep 27, 2024

Fast and memory-efficient exact attention

Python 13,629 1,249 Updated Oct 4, 2024

Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations

Python 49 1 Updated Jul 15, 2024

Stack Solver is an app for the optimisation of palletizing and shipping items.

C# 232 2 Updated Jul 16, 2024

🏞️ PicX 是一款基于 GitHub API 开发的图床工具,提供图片上传托管、生成图片链接和常用图片工具箱服务。

TypeScript 4,548 469 Updated Aug 13, 2024

Animated sprite editor & pixel art tool (Windows, macOS, Linux)

C++ 28,914 5,830 Updated Oct 3, 2024

Code for Fast Training of Diffusion Models with Masked Transformers

Python 356 14 Updated May 15, 2024

Bring portraits to life!

Python 12,095 1,272 Updated Sep 6, 2024

Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.

Python 23 1 Updated Sep 30, 2024

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 652 36 Updated Aug 5, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,704 112 Updated Sep 19, 2024

4M: Massively Multimodal Masked Modeling

Python 1,568 90 Updated Jul 17, 2024

A PyTorch implementation of MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis

Python 507 26 Updated Mar 10, 2023
Next