Skip to content

Snowstormfly/Cross-modal-retrieval-MLAGT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization

The repository is for the paper“Zero-Shot Sketch-Based Remote-Sensing Image Retrieval Based on Multi-Level and Attention-Guided Tokenization”. In this repository, you can find the official PyTorch implementation of multi-level and attention-guided tokenization network Network

Requirements

Python 3.7
pytorch 1.11.0
torchvision 0.12.0
einops  0.6.1

Dataset

We provides access to download the RSketch_Ext dataset from Baidu web disk You are free to divide the training set and the test set as you wish. (Access Password:xpmv)

RSketch_Ext

数据集示例新

Train and Test

Pretrained ViT backbone

The pre-trained ViT model on ImageNet-1K is provided on Baidu Web disk You should place sam_ViT-B_16.pth in ./model and modify line 195 in ./model/self_attention.py to absolute path if necessary. (Access Password:t6p1)

Arguments

# dataset
  train_path              # path to load train data.
  test_path               # path to load test data.
# model
  d_model                 # feature dimension.
  d_ff                    # fead-forward layer dimension.
  head                    # number of cross_attention encoder head.
  number                  # number of cross_attention encoder layer.
  pretrained              # whether to use pretrained ViT model.
# train
  save                    # model save path.
  batch                   # batch size.
  epoch                   # train epoch.
  datasetLen              # the amount of data training in a single batch.
  learning_rate           # learning rate.
  weight_decay            # weight_decay.
# test
  load                    # model load path.
  test_sk                 # testset number of incoming sketches in a single batch.
  test_im                 # testset number of incoming remote sensing image in a single batch.
  num_workers             # dataloader num workers.
  database_path           # preinfer remote sensing image database load path.
  amount                  # visualize the number of remote sensing images returned.
  result_path             # accuracy evaluation result saving path.
# other
  choose_cuda             # cuda to use.
  seed                    # random seed.

Conclusion

Thank you and sorry for the bugs!

Author

* Bo Yang
* Chen Wang
* Xiaoshuang Ma
* Beiping Song
* Zhuang Liu
* Fangde Sun

Citation

If you think this work is interesting, please cite:
 @Article{Cross-modal-retrieval-MLAGT,
    title={Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization},
    author={Bo Yang, Chen Wang, Xiaoshuang Ma, Beiping Song, Zhuang Liu and Fangde Sun},
    year={2024},
    journal={Remote Sensing},
    volume={16},
    number={10},
    pages={1653},
    doi={https://doi.org/10.3390/rs16101653}
  }

About

Multi-Level and Attention-Guided Tokenization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages