Kaul, Chaitanya (2019) Models of Visual Attention in Deep Residual CNNs. PhD thesis, University of York.
Abstract
Feature reuse from earlier layers in neural network hierarchies has been shown to improve the quality of features at a later stage - a concept known as residual learning. In this thesis, we learn effective residual learning methodologies infused with attention mechanisms to observe their effect on different tasks. To this end, we propose 3 architectures across medical image segmentation and 3D point cloud analysis. In FocusNet, we propose an attention based dual branch encoder decoder structure that learns an extremely efficient attention mechanism which achieves state of the art results on the ISIC 2017 skin cancer segmentation dataset. We propose a novel loss enhancement that improves the convergence of FocusNet, performing better than state-of-the-art loss functions such as tversky and focal loss. Evaluations of the architecture proposes two drawbacks which we fix in FocusNetAlpha. Our novel residual group attention block based network forms the backbone of this architecture, learning distinct features with sparse correlations, which is the key reason for its effectiveness. At the time of writing this thesis, FocusNetAlpha outperforms all state-of-the-art convolutional autoencoders with the least parameters and FLOPs compared to them, based on our experiments on the ISIC 2018, DRIVE retinal vessel segmentation and the cell nuclei segmentation dataset. We then shift our attention to 3D point cloud processing where we propose SAWNet, which combines global and local point embeddings infused with attention, to create a spatially aware embedding that outperforms both. We propose a novel method to learn a global feature aggregation for point clouds via a fully differential block that does not need a lot of trainable parameters and gives obvious performance boosts. SAWNet beats state-of-the-art results on ModelNet40 and ShapeNet part segmentation datasets.
Metadata
Supervisors: | Pears, Nick and Manandhar, Suresh |
---|---|
Related URLs: | |
Awarding institution: | University of York |
Academic Units: | The University of York > Computer Science (York) |
Identification Number/EthosID: | uk.bl.ethos.808729 |
Depositing User: | Mr Chaitanya Kaul |
Date Deposited: | 27 Jun 2020 00:00 |
Last Modified: | 21 Jul 2020 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:26993 |
Download
Examined Thesis (PDF)
Filename: Kaul_201056743_CorrectedThesis_Clean.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.