Highlights
- Pro
Stars
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
LPIPS metric. pip install lpips
LAVIS - A One-stop Library for Language-Vision Intelligence
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
FFCV: Fast Forward Computer Vision (and other ML workloads!)
Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting
Benchmark for Multi-domain Evaluation of Semantic Segmentation
Create your own long term database of Immoweb photos and metadata based on criterias
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
This is the official code release for our work, Denoising Vision Transformers.
(NeurIPS2023) CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.
An open-source framework for training large multimodal models.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Most popular metrics used to evaluate object detection algorithms.
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
[NeurIPS 2021] You Only Look at One Sequence
A beautiful, simple, clean, and responsive Jekyll theme for academics
(TPAMI 2024) A Survey on Open Vocabulary Learning
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Codebase for "Decoding language spatial relations to 2D spatial arrangements" (Findings of EMNLP 2020).
Codebase for "Revisiting spatio-temporal layouts for compositional action recognition" (Oral at BMVC 2021).