Safe Reinforcement Learning Using Formally Verified Abstract Policies

Abstract

Reinforcement learning (RL) is an artificial intelligence technique for finding optimal solutions for sequential decision-making problems modelled as Markov decision processes (MDPs). Objectives are represented as numerical rewards in the model where positive values represent achievements and negative values represent failures. An autonomous agent explores the model to locate rewards with the goal to learn behaviour which will cumulate the largest reward possible. Despite RL successes in applications ranging from robotics and planning systems to sensing, it has so far had little appeal in mission- and safety-critical systems where unpredictable agent actions could lead to mission failure, risks to humans, itself or other systems, or violations of legal requirements. This is due to the difficulty of encoding non-trivial requirements of agent behaviour through rewards alone. This thesis introduces assured reinforcement learning (ARL), a safe RL approach that restricts agent actions, during and after learning. This restriction is based on formally verified policies synthesised for a high-level, abstract MDP that models the safety-relevant aspects of the RL problem. The resulting actions form overall solutions whose properties satisfy strict safety and optimality requirements. Next, ARL with knowledge revision is introduced, allowing ARL to still be used if the initial knowledge for generating action constraints proves to be incorrect. Additionally, two case studies are introduced to test the efficacy of ARL: the first is an adaptation of the benchmark flag collection navigation task and the second is an assisted-living planning system. Finally, an architecture for runtime ARL is proposed to allow ARL to be utilised in real-time systems. ARL is empirically evaluated and is shown to successfully satisfy strict safety and optimality requirements and, furthermore, with knowledge revision and action reuse, it can be successfully applied in environments where initial information may prove incomplete or incorrect.

Metadata

Supervisors:	Calinescu, Radu and Kudenko, Daniel
Related URLs:	Assurance in Reinforcement Learning Using Quantitative Verification (Related publication) Assured Reinforcement Learning for Safety-critical Applications (Related publication) Assured Reinforcement Learning with Formally Verified Abstract Policies (Related publication) Combining Reinforcement Learning and Quantitative Verification for Agent Policy Assurance (Related publication)
Keywords:	Reinforcement Learning, Safe Policies, Safety, Assurance, Constraint Verification, Artificial Intelligence, Machine Learning, Quantitative Verification, Model Checking, Formal Methods, Abstract Markov Decision Processes
Awarding institution:	University of York
Academic Units:	The University of York > Computer Science (York)
Identification Number/EthosID:	uk.bl.ethos.766584
Depositing User:	Dr George Mason
Date Deposited:	23 Apr 2019 12:59
Last Modified:	19 Feb 2020 13:07
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:23737

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Safe Reinforcement Learning Using Formally Verified Abstract Policies

Abstract

Metadata

Download

Examined Thesis (PDF)

Export

Statistics