Dynamic Weighted Combiner for Mixed-Modal Image Retrieval - Accepted at AAAI2024

The paper can be accessed at: https://arxiv.org/pdf/2312.06179.pdf

If you find this code useful in your research then please cite

''' @article{huang2023dynamic,

title={Dynamic Weighted Combiner for Mixed-Modal Image Retrieval},

author={Huang, Fuxiang and Zhang, Lei and Fu, Xiaowei and Song, Suqi},

journal={arXiv preprint arXiv:2312.06179},

year={2023} }

@inproceedings{huang2023dynamic,

title={Dynamic Weighted Combiner for Mixed-Modal Image Retrieval},

author={Huang, Fuxiang and Zhang, Lei and Fu, Xiaowei and Song, Suqi},

booktitle={Association for the Advance of Artificial Intelligence (AAAI)},

year={2024} } '''

Abstract

Mixed-Modal Image Retrieval (MMIR) as a flexible search paradigm has attracted wide attention. However, previous approaches always achieve limited performance, due to two critical factors are seriously overlooked. 1) The contribution of image and text modalities is different, but incorrectly treated equally. 2) There exist inherent labeling noises in describing users' intentions with text in web datasets from diverse real-world scenarios, giving rise to overfitting. We propose a Dynamic Weighted Combiner (DWC) to tackle the above challenges, which includes three merits. First, we propose an Editable Modality De-equalizer (EMD) by taking into account the contribution disparity between modalities, containing two modality feature editors and an adaptive weighted combiner. Second, to alleviate labeling noises and data bias, we propose a dynamic soft-similarity label generator (SSG) to implicitly improve noisy supervision. Finally, to bridge modality gaps and facilitate similarity learning, we propose a CLIP-based mutual enhancement module alternately trained by a mixed-modality contrastive loss. Extensive experiments verify that our proposed model significantly outperforms state-of-the-art methods on real-world datasets.

Requirements and Installation

Python 3.7

$ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/openai/CLIP.git

Running the experiments

Download the datasets

CSS3D dataset

Download the dataset from this external website.

Fashion200k dataset

Download the dataset via this link
To ensure fair comparison, we employ the same test queries as TIRG. They can be downloaded from here.

FashionIQ dataset

Download Fashion-IQ dataset images from here.
Download Fashion-IQ dataset annotations from here.
To ensure fair comparison, we employ the same splits as VAL. They can be downloaded from here.

Shoes dataset

Download Shoes dataset images from here.
Download Shoes dataset annotations from here.

Running the Code

For training and testing new models, pass the appropriate arguments.

For instance, for training DWC model on Fashion200k dataset run the following command:

python   main.py --dataset=fashion200k --dataset_path=../data/fashion200k/

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
DWC.zip		DWC.zip
Our_global.py		Our_global.py
Our_local.py		Our_local.py
README.md		README.md
blocks1207.py		blocks1207.py
datasets.py		datasets.py
main.py		main.py
metric_loss.py		metric_loss.py
newblocks.py		newblocks.py
test.py		test.py
test2.py		test2.py
text_model.py		text_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic Weighted Combiner for Mixed-Modal Image Retrieval - Accepted at AAAI2024

Abstract

Requirements and Installation

Running the experiments

Download the datasets

CSS3D dataset

Fashion200k dataset

FashionIQ dataset

Shoes dataset

Running the Code

About

Releases

Packages

Languages

fuxianghuang1/DWC

Folders and files

Latest commit

History

Repository files navigation

Dynamic Weighted Combiner for Mixed-Modal Image Retrieval - Accepted at AAAI2024

Abstract

Requirements and Installation

Running the experiments

Download the datasets

CSS3D dataset

Fashion200k dataset

FashionIQ dataset

Shoes dataset

Running the Code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages