Skip to content
View KevinfromTJ's full-sized avatar
  • Tongji University
  • Shanghai,China

Highlights

  • Pro

Block or report KevinfromTJ

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results
Python 80 14 Updated Apr 15, 2022

AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation

Python 65 9 Updated Sep 4, 2024

Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

Python 216 11 Updated Sep 15, 2024

MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.

Python 685 34 Updated Oct 1, 2024

[Embodied-AI-Survey-2024] Paper list and projects for Embodied AI

568 38 Updated Sep 26, 2024

API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series

Python 735 22 Updated Aug 9, 2024

Free ChatGPT API Key,免费ChatGPT API,支持GPT4 API(免费),ChatGPT国内可用免费转发API,直连无需代理。可以搭配ChatBox等软件/插件使用,极大降低接口使用成本。国内即可无限制畅快聊天。

Python 21,889 1,644 Updated Sep 26, 2024

[CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

Jupyter Notebook 38 1 Updated Aug 3, 2024

A theme for obsidian.md

CSS 1,390 46 Updated Oct 3, 2024

A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites

2,830 231 Updated Sep 9, 2024

Ideas and thoughts about the fascinating Vision-and-Language Navigation

143 12 Updated Jun 28, 2023

Official code for Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions (CVPR 2024)

Python 17 Updated Jun 21, 2024

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)

Jupyter Notebook 52 5 Updated Jan 2, 2024

带带弟弟 通用验证码识别OCR pypi版

Python 9,745 1,740 Updated Jul 25, 2024

Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.

Python 1,896 251 Updated Jan 3, 2024

[验证码识别-训练] This project is based on CNN/ResNet/DenseNet+GRU/LSTM+CTC/CrossEntropy to realize verification code identification. This project is only for training the model.

Python 3,005 818 Updated Oct 24, 2022

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 296 32 Updated Aug 19, 2024

[ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding

Python 13 Updated Aug 30, 2024

为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…

Python 64,522 7,974 Updated Oct 5, 2024

MambaOut: Do We Really Need Mamba for Vision?

Python 1,979 34 Updated Jun 6, 2024

Authors's code for "Variational Causal Inference Network for Explanatory Visual Question Answering" and "Integrating Neural-Symbolic Reasoning with Variational Causal Inference Network for Explanat…

Python 8 2 Updated Jun 19, 2024

Code for WACV 2021 Paper "Meta Module Network for Compositional Visual Reasoning"

Python 43 6 Updated May 13, 2021

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

3,471 143 Updated Sep 25, 2024

[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.

Python 28 2 Updated Aug 1, 2024

🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.

13,249 1,372 Updated Feb 13, 2023

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Python 12,140 998 Updated Jul 5, 2024

Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"

Python 16 1 Updated Apr 22, 2024

Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)

Python 26 1 Updated Jul 13, 2024

Data augmentation for NLP

Jupyter Notebook 4,418 462 Updated Jun 24, 2024

Official implementation of Think Global, Act Local: Dual-scale GraphTransformer for Vision-and-Language Navigation (CVPR'22 Oral).

Python 107 8 Updated Jun 27, 2023
Next