Highlights
- Pro
Stars
Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models l…
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
CSGO: Content-Style Composition in Text-to-Image Generation 🔥
SAM with text prompt
Official code for "RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control"
Evaluating text-to-image/video/3D models with VQAScore
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
Understand Human Behavior to Align True Needs
A modular graph-based Retrieval-Augmented Generation (RAG) system
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
OneFormer: One Transformer to Rule Universal Image Segmentation, arxiv 2022 / CVPR 2023
[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
Official inference repo for FLUX.1 models
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
A curated publication list on open vocabulary semantic segmentation and related area (e.g. zero-shot semantic segmentation) resources..
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
Combining Segment Anything (SAM) with Grounded DINO for zero-shot object detection and CLIPSeg for zero-shot segmentation
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.