-
SJTU
- Shanghai, China
Lists (1)
Sort Last updated
Stars
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-contex…
This is a collection of our NAS and Vision Transformer work.
A high-throughput and memory-efficient inference and serving engine for LLMs
a state-of-the-art-level open visual language model | 多模态预训练模型
LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
数据挖掘、计算机视觉、自然语言处理、推荐系统竞赛知识、代码、思路
fast-stable-diffusion + DreamBooth
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
This is the pytorch implement of our paper "RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model"
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editin…
[CVPR'23] Universal Instance Perception as Object Discovery and Retrieval
Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning" (ICLR 2022)
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Adapting Meta AI's Segment Anything to Downstream Tasks with Adapters and Prompts
SeqTR: A Simple yet Universal Network for Visual Grounding
Related papers about Weakly-supervised Audio-Visual Video Parsing (AVVP) & Audio-Visual Event Localization (AVE)
Related papers about Referring Image Segmentation (RIS)
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119