Individual decision making, reinforcement learning and myopic behaviour

Abstract

Individuals use their cognitive abilities to make decisions, with the ultimate goal of improving their status. Decisions outcomes are used to learn the association between the decisions which lead to good results and those resulting in punishing outcomes. These associations might not be easily inferable because of environmental complexity or noisy feedback. Tasks in which outcomes probabilities are known are termed “decisions under risk”. Researchers have consistently showed that people are risk averse when choosing among options featuring gains, while they are risk seeking when making decisions about options featuring losses. When the probabilities of the options are not clearly stated the task is known as “decisions under ambiguity”. In this type of task individuals face an exploration-exploitation trade off: to maximise their profit they need to choose the best option but at the same time they need to discover which option leads to the best outcome by trial-and-error. The process of knowledge acquisition by interaction with the environment is called adaptive learning.
Evidence from literature points in the direction of unskilled investors behaviour being consistent with naive reinforcement learning, simply adjusting their preference for which option to choose based on its recent outcomes. Experimental data from a binary choice task and a quasi-field scenario is used to test a combination of Reinforcement Learning and Prospect Theory. Both the investigations include reinforcement learning models featuring specific parameters which can be tuned to describe individual learning decision-making strategies. The first part is focused on integrating the two computational models, the second on testing it on a more realistic scenario. The results indicate that the combination of Reinforcement Learning and Prospect Theory could be a descriptive account of decision- making in binary decision tasks. A two-state space configuration, together with a non- saturating reward function appears to be the best setup to capture behaviour in said task. Moreover, analysing the parameters of the models it becomes evident that payoff variability has an impact on speed of learning and randomness of choice. The same modelling approach fails to capture behaviour in a more complex task, indicating that more complex models might be needed to provide a computational account of decisions from experience in non-trivial tasks.

Metadata

Supervisors:	Vasilaki, Eleni and Stafford, Tom and Marshall, James
Keywords:	decision-making, reinforcement-learning
Awarding institution:	University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield)
Identification Number/EthosID:	uk.bl.ethos.778786
Depositing User:	Mr Alvin Pastore
Date Deposited:	28 May 2019 09:19
Last Modified:	25 Sep 2019 20:08
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:23967

Download

thesis_resub_Pastore_130111317

Filename: thesis_resub_Pastore_130111317.pdf

Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License

CLICK TO DOWNLOAD

[thumbnail of thesis_resub_Pastore_130111317.pdf]

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Individual decision making, reinforcement learning and myopic behaviour

Abstract

Metadata

Download

thesis_resub_Pastore_130111317

Export

Statistics