The datasets for the paper 'Semantic Aligned Cross-Modal Visual Grounding Network with Transformer'
We manually annotate a text description for each image in two fine-grained object detection datasets ( "Military Aircraft Dataset" and "FGVC Aircraft Dataset" ), which could be downloaded in the 百度网盘(提取码: v3c0).