HE, Z (2011) The Reality of Using Digital By-Product Data in Social Science Analysis --A Case Study of Wikipedia. PhD thesis, University of York.
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.
In response to a methodological challenge in social science research, especially linked to studies of online phenomena including Web 2.0 applications, this thesis proposes a new methodology that deploys digital by-product data. Digital by-product data is the data created by an internet operating system to back-up content including browsing history, files downloaded, photos uploaded and so on. With the emergence of Information and Communication Technologies (ICTs), our daily life is becoming digitalized and can be described by digital by-product data. This thesis seeks to demonstrate that using digital by-product data is an important opportunity to help social scientists overcome various bottlenecks such as the deficiency of data and the limitations of analysis and possible risks of bias when using existing research methodology. Proposals relating to the new methodology are based on a discussion and analysis of the current data environment of social science research, the online environment and existing research methodology found within the digital science field. The experimental aspect of the thesis uses digital by-product data to explore online phenomena, and to evaluate the utility of applying such a methodology more generally. After considering the availability of the data resources, the diversity of the data types, the usability of the data, and the research value of the subject, Wikipedia was chosen as our case study. The thesis uses the digital by-product data that is generated by Wikipedia to analyse its collaborative mode in which millions of participants work together to provide an online encyclopaedia. The research is constructed in such a way that three related issues are addressed in a step-by-step manner. We aim to answer whether there is a collaborative model in Wikipedia and if so, what it is and how it works. In the process of answering this, we describe the existing dynamics of mass collaboration; build a model of the collaborative model; explain the approaches and ratio of contribution by the various participants; and then analyse the administrative system as well as its policy to deal with editing conflicts. Finally, the results of this work are displayed in different ways, including the use of mathematical equations, metrics and visualization. The thesis demonstrates that using digital by-product data provides a series of benefits to resolve the contemporary methodological challenge in the field and extends the capabilities of social scientists to investigate online phenomena. The thesis also provides practical lessons to guide investigators to help them to avoid the mistakes and problems that were encountered by the author of this thesis. Through studying an actual social phenomenon, the objective of this research is to evaluate the possibility and feasibility of using a new methodology, which makes use of a neglected data resource to improve the engagement of social science with the world of the web. Such an evaluation can help scholars interested in using digital by-product data in their studies and also can provide some innovative ideas for social scientists in a new information age.
|Item Type:||Thesis (PhD)|
|Academic Units:||The University of York > Sociology (York)|
|Depositing User:||Miss Z HE|
|Date Deposited:||01 May 2012 14:52|
|Last Modified:||08 Aug 2013 08:48|