A generic data representation for predicting player behaviours

Abstract

A common use of predictive models in game analytics is to predict the behaviours of players so that pre-emptive measures can be taken before they make undesired decisions. A standard data pre-processing step in predictive modelling includes both data representation and category definition.

Data representation extracts features from the raw dataset to represent the whole dataset. Much research has been done towards predicting important player behaviours with game-specific data representations. Some of the resulting efforts have achieved competitive performance; however, due to the game-specific data representations they apply, game companies need to spend extra efforts to reuse the proposed methods in more than one products. This work proposes an event-frequency-based data representation that is generally applicable to games. This method of data representation relies only on counts of in-game events instead of prior knowledge of the game. To verify the generality and performance of this data-representation, it was applied to three different genres of games for predicting player first-purchasing, disengagement and churn behaviours. Experiments show that this data representation method can provide a competitive performance across different games.

Category definition is another essential component of classification problems. As labelling method that relies on some specific conditions to distribute players into classes can often lead to imbalanced classification problems, this work applied two commonly used approaches, i.e., random undersampling and Synthetic Minority Over-Sampling Technique (SMOTE), for rebalancing the imbalanced tasks. Results suggested that undersampling is able to provide better performance in the cases where the quantity of data is sufficient whereas the SMOTE has more chances when the dataset is too small to be balanced with the undersampling approach. Besides, this work also proposes a new category-definition method which can maintain a distribution of the resultant classes that is closer to balanced. In addition, the parameters used in this method can also be used to gain insight into the health of the game. Preliminary experimental results show that this method of category definition is able to improve the balance of the class distribution when it is applied to different games and provide significantly better performance than random classifiers.

Metadata

Supervisors:	Kudenko, Daniel and Devlin, Sam
Related URLs:	Predicting Player Disengagement in Online Games (Related publication) Predicting player disengagement and first purchase with event-frequency based data representation (Related publication) Predicting Disengagement in Free-to-Play Games with Highly Biased Data (Related publication)
Awarding institution:	University of York
Academic Units:	The University of York > Computer Science (York)
Identification Number/EthosID:	uk.bl.ethos.745726
Depositing User:	Mr Hanting Xie
Date Deposited:	11 Jun 2018 08:51
Last Modified:	24 Jul 2018 15:24
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:20137

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

A generic data representation for predicting player behaviours

Abstract

Metadata

Download

Examined Thesis (PDF)

Export

Statistics