Riley, Joshua (2023) Safe Multi-Agent Reinforcement Learning with Quantitatively Verified Constraints. PhD thesis, University of York.
Abstract
Multi-agent reinforcement learning is a machine learning technique that involves
multiple agents attempting to solve sequential decision-making problems. This learn-
ing is driven by objectives and failures modelled as positive numerical rewards and
negative numerical punishments, respectively. These multi-agent systems explore
shared environments in order to find the highest cumulative reward for the sequential
decision-making problem. Multi-agent reinforcement learning within autonomous
systems has become a prominent research area with many examples of success and
potential applications. However, the safety-critical nature of many of these potential
applications is currently underexplored—and under-supported. Reinforcement learn-
ing, being a stochastic process, is unpredictable, meaning there are no assurances that
these systems will not harm themselves, other expensive equipment, or humans. This
thesis introduces Assured Multi-Agent Reinforcement Learning (AMARL) to mitigate
these issues. This approach constrains the actions of learning systems during and
after a learning process. Unlike previous multi-agent reinforcement learning methods,
AMARL synthesises constraints through the formal verification of abstracted multi-
agent Markov decision processes that model the environment’s functional and safety
aspects. Learned policies guided by these constraints are guaranteed to satisfy strict
functional and safety requirements and are Pareto-optimal with respect to a set of op-
timisation objectives. Two AMARL extensions are also introduced in the thesis. Firstly,
the thesis presents a Partial Policy Reuse method that allows the use of previously
learned knowledge to reduce AMARL learning time significantly when initial models
are inaccurate. Secondly, an Adaptive Constraints method is introduced to enable
agents to adapt to environmental changes by constraining their learning through a
procedure that follows the styling of monitoring, analysis, planning, and execution
during runtime. AMARL and its extensions are evaluated within three case studies
from different navigation-based domains and shown to produce policies that meet
strict safety and functional requirements.
Metadata
Supervisors: | Calinescu, Radu and Paterson, Colin |
---|---|
Keywords: | Multi-Agent Reinforcement Learning; Quantitative Verification; Deep Reinforcement learning; Safe AI |
Awarding institution: | University of York |
Academic Units: | The University of York > Computer Science (York) |
Identification Number/EthosID: | uk.bl.ethos.883549 |
Depositing User: | Mr Joshua Paul Riley |
Date Deposited: | 08 Jun 2023 08:20 |
Last Modified: | 21 Jul 2023 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:32941 |
Download
Examined Thesis (PDF)
Filename: Riley_PhD_Thesis.pdf
Description: PhD Thesis
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.