Multi-faceted Performance Metrics for Reinforcement Learning

Abstract

Deep reinforcement learning (DRL) is a promising approach for endowing robotic agents with full autonomy. Despite its wide successes, DRL lacks a comprehensive theory, and empirically, it suffers from inconsistent reward-based performance (i.e. cumulative rewards) across benchmarks. To address this, we develop a systematic evaluation methodology that examines and compares DRL algorithms across multiple performance dimensions instead of relying solely on rewards. The framework analyses algorithms over exploration, robustness and long-term consequences aspects of algorithm performance for better insight.

An efficient RL agent effectively addresses challenges of exploration, generalisation
and long-term consequences. In exploration, we develop Effort of Sequential Learning
(ESL) and Optimal Movement Ratio (OMR) to capture exploration efficiency. Typically, a reward-based metric such as regret is employed for evaluating exploration. As complementary metrics, we introduce ESL — the relative distance travelled by an algorithm in the policy space to discover an optimal policy — and OMR (the fraction of movements in the policy space the algorithm takes to effectively reduce an analogue of regret).

Directly capturing generalisation can be difficult, so we study robustness of RL algorithms and task complexity to capture certain aspects of generalisation. Robustness is required for an algorithm to generalise, while task complexity dictates the task ordering in curriculum learning which significantly impacts generalisability. We examine long-term consequences via cumulative rewards and demonstrations. This is because cumulative rewards do not always qualitatively represent the agent’s desired behaviour. Thus, observing the behaviour of the agent while executing the learned policy is imperative for validating its performance.

Finally, we demonstrate the utility of our proposed evaluation methodology by employing it in a robotic task to assess and compare RL algorithms suitable for robotic manipulation. Our framework is rich and principled, particularly useful where strictly reward-based performance metrics can misrepresent true task success.

Metadata

Supervisors:	Prescott, Tony and Aditya, Gilra
Related URLs:	Author's website (Author) Github (Author) TMLR (Publisher) AAMAS (Publisher)
Keywords:	Reinforcement Learning; Deep Reinforcement Learning; Multi-dimensional Performance Metrics; Exploration; Task Complexity; Robustness; Hyperparameters; Robotics; Robotic Manipulator; Machine Learning;
Awarding institution:	University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield)
Date Deposited:	30 Mar 2026 08:25
Last Modified:	30 Mar 2026 08:25
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:38454

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Multi-faceted Performance Metrics for Reinforcement Learning

Abstract

Metadata

Download

Final eThesis - complete (pdf)

Export

Statistics