Yang, Zhile (2025) Bio-inspired reinforcement learning: Algorithm development and its application to visual search. PhD thesis, University of Leeds.
Abstract
The field of reinforcement learning has seen significant advances in recent years. However, there are still many challenges, including adaptability to environmental changes, robustness to noise, energy efficiency, safety, etc. A promising direction is to incorporate neuroscience findings to explore the potential of replicating the strong cognitive abilities of humans and animals, which, in return, can also contribute to our understanding of brain functions.
In this work, I propose a new model of spiking neural network and derive a reinforcement learning algorithm for it. The algorithm is based on reward-modulated spike-timing-dependent plasticity, thus having better biological plausibility. Experiments on standard reinforcement learning tasks demonstrate its ability to solve challenging tasks and have better inherent robustness to a variety of perturbations than standard methods.
My method is also applied to real-life visual search scanpath modeling tasks that are more challenging. Additionally, I design a new map-based inverse reinforcement learning method that can better extract motivations from scanpaths. Experiments show the effectiveness of the spiking neural network in solving the scanpath modeling task. To obtain an in-depth understanding of the cognitive mechanisms of visual search behaviors, I further apply the reinforcement learning method to the analysis of scanpath properties of social and non-social behaviors of visual search. The results offer new understandings of the patterns of eye movements.
Taken together, the results presented in this thesis provide novel insights into not only developing new reinforcement learning algorithms but also understanding the behaviors and mechanisms of our visual search function.
Metadata
| Supervisors: | Wang, Yongxing and Head, David |
|---|---|
| Keywords: | spiking neural networks, reinforcement learning, reward-modulated spike-timing-dependent plasticity (R-STDP), winner-take-all circuit, variational policy gradient, visual search |
| Awarding institution: | University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds) |
| Date Deposited: | 13 Jan 2026 17:01 |
| Last Modified: | 13 Jan 2026 17:01 |
| Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:37748 |
Download
Final eThesis - complete (pdf)
Filename: Yang_Z_ComputerScience_PhD_2025.pdf
Licence:

This work is licensed under a Creative Commons Attribution 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.