Riley, Joshua (2023) Safe Multi-Agent Reinforcement Learning with Quantitatively Verified Constraints. PhD thesis, University of York.
Abstract
Multi-agent reinforcement learning is a machine learning technique that involves multiple agents attempting to solve sequential decision-making problems. This learn- ing is driven by objectives and failures modelled as positive numerical rewards and negative numerical punishments, respectively. These multi-agent systems explore shared environments in order to find the highest cumulative reward for the sequential decision-making problem. Multi-agent reinforcement learning within autonomous systems has become a prominent research area with many examples of success and potential applications. However, the safety-critical nature of many of these potential applications is currently underexplored—and under-supported. Reinforcement learn- ing, being a stochastic process, is unpredictable, meaning there are no assurances that these systems will not harm themselves, other expensive equipment, or humans. This thesis introduces Assured Multi-Agent Reinforcement Learning (AMARL) to mitigate these issues. This approach constrains the actions of learning systems during and after a learning process. Unlike previous multi-agent reinforcement learning methods, AMARL synthesises constraints through the formal verification of abstracted multi- agent Markov decision processes that model the environment’s functional and safety aspects. Learned policies guided by these constraints are guaranteed to satisfy strict functional and safety requirements and are Pareto-optimal with respect to a set of op- timisation objectives. Two AMARL extensions are also introduced in the thesis. Firstly, the thesis presents a Partial Policy Reuse method that allows the use of previously learned knowledge to reduce AMARL learning time significantly when initial models are inaccurate. Secondly, an Adaptive Constraints method is introduced to enable agents to adapt to environmental changes by constraining their learning through a procedure that follows the styling of monitoring, analysis, planning, and execution during runtime. AMARL and its extensions are evaluated within three case studies from different navigation-based domains and shown to produce policies that meet strict safety and functional requirements.
Metadata
Supervisors: | Calinescu, Radu and Paterson, Colin |
---|---|
Keywords: | Multi-Agent Reinforcement Learning; Quantitative Verification; Deep Reinforcement learning; Safe AI |
Awarding institution: | University of York |
Academic Units: | The University of York > Computer Science (York) |
Identification Number/EthosID: | uk.bl.ethos.883549 |
Depositing User: | Mr Joshua Paul Riley |
Date Deposited: | 08 Jun 2023 08:20 |
Last Modified: | 21 Jul 2023 09:53 |
Download
Examined Thesis (PDF)
Filename: Riley_PhD_Thesis.pdf
Description: PhD Thesis
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.