Curriculum learning for online reinforcement learning

Abstract

Curriculum learning in reinforcement learning is a rapidly growing research field used to shape exploration by presenting the agent with increasingly complex tasks. The idea of curriculum learning has been largely applied in both animal training and pedagogy. In reinforcement learning, most of the previous task sequencing methods have shaped exploration with the objective of reducing the time to reach a given performance level.
In this work, we start by proposing novel uses of curriculum learning, which arise from choosing different objective functions. We define a general optimization framework for task sequencing based on combinatorial optimization. The framework is composed of several performance metrics for the evaluation of a curriculum and three different task scenarios. Furthermore we study the shape of the curricula search space in order to understand what are the salient features characterizing it.
We adapt popular metaheuristic search methods to the task sequencing problem in curriculum learning to find curricula optimizing any of the given performance metrics. Critical tasks, in which suboptimal exploratory actions must be minimized, can benefit from curriculum learning, and its ability to shape exploration through transfer. We propose a task sequencing algorithm maximizing the cumulative return, that is, the return obtained by the agent across all the learning episodes. By maximizing the cumulative return, the agent not only aims at achieving high rewards as fast as possible, but also at doing so while limiting suboptimal actions.
Finally we evaluate the performance of the metaheuristic search methods on several tasks. We show that curriculum learning can be successfully used to: improve the initial performance, take fewer suboptimal actions during exploration, and discover better policies. We also experimentally compare them to our task sequencing algorithm, and show that it achieves significantly better performance on the problem of cumulative return maximization. Furthermore, we validate our algorithm on a critical task, optimizing a home controller for a micro energy grid.

Metadata

Supervisors:	Leonetti, Matteo and Cohn, Anthony
Related URLs:	An Optimization Framework for Task Sequencing in Curriculum Learning (Related publication) Curriculum Learning for Cumulative Return Maximization (Related publication)
Keywords:	Machine Learning; Reinforcement Learning; Transfer Learning; Curriculum Learning
Awarding institution:	University of Leeds
Academic Units:	The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds)
Identification Number/EthosID:	uk.bl.ethos.826735
Depositing User:	Francesco Foglino
Date Deposited:	29 Mar 2021 10:23
Last Modified:	11 May 2021 09:53
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:28484

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Curriculum learning for online reinforcement learning

Abstract

Metadata

Download

Final eThesis - complete (pdf)

Export

Statistics