-
South China University of Technology
- Guangzhou, China
Stars
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
A lightweight Python library for simulating Chinese handwriting
An open source implementation of CLIP.
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
DocBank: A Benchmark Dataset for Document Layout Analysis
[CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"
Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)
Pytorch implementation for "Decoupled attention network for text recognition".
Summarize existing representative LLMs text datasets.
[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
i. A practical application of Transformer (ViT) on 2-D physiological signal (EEG) classification tasks. Also could be tried with EMG, EOG, ECG, etc. ii. Including the attention of spatial dimension…
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
SCUT-EnsExam is a real-world handwritten text erasure dataset for examination paper scenarios, which consists of 545 examination paper images. The dataset is randomly divided into training set and …
Official PyTorch implementation of the CVPR 2022 paper: "Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Discriminator"
(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
Real-CE: A Benchmark for Chinese-English Scene Text Image Super-resolution (ICCV2023)