Hu, Zechao (2022) Deep learning with query sensitive attention mechanisms for content-based image retrieval. PhD thesis, University of York.
Abstract
Content-based Image Retrieval (CBIR) is the task of searching for the most similar images to the query content from an extensive image database. Most existing feature extraction methods and attention mechanisms for CBIR tasks are query non-sensitive, ignoring the specifics of the query pattern, which may lead to focusing on irrelevant regions to the query content. In this thesis, we explore query sensitive attention mechanisms for CBIR task, which involves query feature information in the feature extraction procedure of the candidate image.
Firstly, we propose the Conditional Attention Network (CANet). CANet takes the query image and a candidate image as input, resulting in a co-attention map of the candidate image under the condition of the query content. The generated co-attention map could correctly highlight the target object and improve image retrieval performance when embedded into a convolution neural network (CNN) based feature extraction pipeline.
Secondly, another more efficient co-attention method is proposed based on local feature selection and clustering over candidate local features. Using local feature selection and clustering dramatically reduces the computation costs caused by the query sensitivity but still leads to accurate co-attention maps even under challenging situations. The proposed clustering-based co-attention method leads to new state-of-the-art performance on several benchmark datasets.
Lastly, we explore using clustered expressive local features to perform many-to-many local feature matching for CBIR. We show that the proposed local feature matching method implicitly generates co-attention-like local matching maps. In addition, a trainable binary encoding layer is applied for network fine-tuning, enabling the model to generate compact binary codes with slight performance degradation and greatly reducing computation costs.
In summary, we demonstrate that the query information could play an important role in feature extraction for the CBIR task. With a simple design, co-attention could be practical and effective even for large-scale image retrieval tasks.
Metadata
Supervisors: | Bors, Adrian |
---|---|
Keywords: | deep learning; content-based image retrieval; attention |
Awarding institution: | University of York |
Academic Units: | The University of York > Computer Science (York) |
Identification Number/EthosID: | uk.bl.ethos.875104 |
Depositing User: | Mr Zechao Hu |
Date Deposited: | 09 Mar 2023 14:07 |
Last Modified: | 21 Apr 2023 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:32394 |
Download
Examined Thesis (PDF)
Filename: Hu_204058100_Thesis_Deposit.pdf
Description: Thesis
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.