Highlights
- Pro
Lists (1)
Sort Oldest
Stars
AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.
[Embodied-AI-Survey-2024] Paper list and projects for Embodied AI
API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
Free ChatGPT API Key,免费ChatGPT API,支持GPT4 API(免费),ChatGPT国内可用免费转发API,直连无需代理。可以搭配ChatBox等软件/插件使用,极大降低接口使用成本。国内即可无限制畅快聊天。
[CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
Ideas and thoughts about the fascinating Vision-and-Language Navigation
Official code for Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions (CVPR 2024)
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)
Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.
[验证码识别-训练] This project is based on CNN/ResNet/DenseNet+GRU/LSTM+CTC/CrossEntropy to realize verification code identification. This project is only for training the model.
Disaggregated serving system for Large Language Models (LLMs).
[ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
MambaOut: Do We Really Need Mamba for Vision?
Authors's code for "Variational Causal Inference Network for Explanatory Visual Question Answering" and "Integrating Neural-Symbolic Reasoning with Variational Causal Inference Network for Explanat…
Code for WACV 2021 Paper "Meta Module Network for Compositional Visual Reasoning"
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"
Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)
Official implementation of Think Global, Act Local: Dual-scale GraphTransformer for Vision-and-Language Navigation (CVPR'22 Oral).