White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Potential-Based Reward Shaping for Knowledge-Based, Multi-Agent Reinforcement Learning

Devlin, Sam Michael (2013) Potential-Based Reward Shaping for Knowledge-Based, Multi-Agent Reinforcement Learning. PhD thesis, University of York.

Text (pdf)
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (1462Kb) | Preview


Reinforcement learning is a robust artificial intelligence solution for agents required to act in an environment, making their own decisions on how to behave. Typically an agent is deployed alone with no prior knowledge, but if given sufficient time, a suitable state representation and an informative reward function is guaranteed to learn how to maximise its long term reward. Incorporating domain knowledge, typically known by the system designer, can minimise the number of suboptimal behaviours tried and, therefore, speed up the rate of learning. Potential-based reward shaping is a method of providing this knowledge to an agent by additional rewards. Furthermore, if the agent is alone in the environment, it is guaranteed to learn the same behaviour both with and without potential-based reward shaping. Meanwhile, there has also been a growing interest in deploying not just one agent but many into the same environment. This application can benefit from the potential of both multi-agent systems and reinforcement learning. However, practical use is often limited by the non-stationary environment, exponential increase in state features with every agent added and partial observability. This thesis documents work combining knowledge-based reinforcement learning and multi-agent reinforcement learning so that the latter can be achieved quicker and, therefore, feasibly applied to complex problem domains. Experience gained from many empirical studies is gathered to support novel theoretical contributions proving that the pre-existing guarantees of potential-based reward shaping do not apply when used in multi-agent problem domains. Instead multi-agent potential-based reward shaping may cause agents to learn a different behaviour, but this behaviour is guaranteed to be from the same set of behaviours that the agents could have learned without the additional rewards. Therefore, knowledge-based multi-agent reinforcement learning can both reduce the time a group of agents need to learn a suitable behaviour and increase their final performance.

Item Type: Thesis (PhD)
Academic Units: The University of York > Computer Science (York)
Identification Number/EthosID: uk.bl.ethos.589322
Depositing User: Mr Sam Michael Devlin
Date Deposited: 10 Feb 2014 10:45
Last Modified: 08 Sep 2016 13:30
URI: http://etheses.whiterose.ac.uk/id/eprint/5007

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)