Jaramillo-Avila, Uziel ORCID: https://orcid.org/0000-0003-4594-2862 (2021) Bio-inspired foveal and peripheral visual sensing for saliency-based decision making in robotics. PhD thesis, University of Sheffield.
Abstract
Computer vision is an area of research that has grown at immense speed in the last few decades, tackling problems towards scene understanding from very diverse fronts, such as image classification, object detection, localization, mapping and tracking. It has also been long understood that there are very valuable lessons to learn from biology and to be applied to this research field, where the human visual system is very likely the most studied brain mechanism.
The eye foveation system is a very good example of such lessons, since both machines and animals often face a similar dilemma; to prioritize visual areas of interest to faster process information, given limited computing power and from a field of view that is too wide to be simultaneously attended. While extensive models of artificial foveation have been presented, the re-emerging area of machine learning with deep neural networks has opened the question into how these two approaches can contribute to each other. Novel deep learning models often rely on the availability of substantial computing power, but areas of application face strict constraints, a good example are unmanned aerial vehicles, which in order to be autonomous should lift and power all their computing equipment.
In this work it is studied how applying a foveation principle to down-scale images can be used to reduce the number of operations required for object detection, and compare its effect to normally down-sampled images, given the prevalent number of operations by Convolutional Neural Network (CNN) layers. Foveation requires prior knowledge of regions of interest to center the fovea, this point in question is addressed by a merging of bottom-up saliency and top-down feedback of objects that the CNN has been trained to detect. Albeit saliency models have also been studied extensively in the last couple of decades, most often comparing their performance to human observer datasets, the question remains open into how they fit in wider information processing paradigms and into functional representations of the human brain. It is proposed here an information flow scheme that encompasses these principles.
Finally, to give to the model the capacity to operate coherently in the time domain, it adapts a representation of a well-established theory of the decision-making process that takes place in the basal ganglia region of the brain. The behaviour of this representation is then tested against human observer's data in an omnidirectional field of view, where the importance of selecting the most contextually relevant region of interest in each time-step is highlighted.
Metadata
Supervisors: | Sean, Anderson |
---|---|
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Automatic Control and Systems Engineering (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.831209 |
Depositing User: | Uziel Jaramillo-Avila |
Date Deposited: | 23 May 2021 00:27 |
Last Modified: | 01 Jul 2021 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:28877 |
Download
Final eThesis - complete (pdf)
Filename: Thesis_Uziel_final.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.