ZHAO, ZHIXUE ORCID: https://orcid.org/0000-0002-3060-269X (2022) Using Pre-trained Language Models for Toxic Comment Classification. PhD thesis, University of Sheffield.
Abstract
Toxic comment classification is a core natural language processing task for combating online toxic comments. It follows the supervised learning paradigm which requires labelled data for the training. A large amount of high-quality training data is empirically beneficial to the model performance. Transferring a pre-trained language model (PLM) to a downstream model allows the downstream model to access more data without creating new labelled data. Despite the increasing research on PLMs in NLP tasks, there remains a fundamental lack of understanding in applying PLMs to toxic comment classification. This work focuses on this area from three perspectives.
First, we investigate different transferring strategies for toxic comment classification tasks. We highlight the importance of efficiency during the transfer. The transferring efficiency seeks a reasonable requirement of computational resources and a comparable model performance at the same time. Thus, we explore the continued pre-training in-domain which further pre-trains a PLM with in-domain corpus. We compare different PLMs and different settings for the continued pre-training in-domain.
Second, we investigate the limitations of PLMs for toxic comment classification. Taking the most popular PLM, BERT, as the representative model for our study, we focus on studying the identity term bias (i.e. prediction bias towards comments with identity terms, such as "Muslim" and "Black"). To investigate the bias, we conduct both quantitative and qualitative analyses and study the model explanations. We also propose a hypothesis that builds on the potential relationship between the identity term bias and the subjectivity of comments.
Third, building on the hypothesis, we propose a novel BERT-based model to mitigate the identity term bias. Our method is different from previous methods that try to suppress the model's attention to identity terms. To do so, we insert the subjectivity into the model along with the suggestion of the presence of identity terms. Our method shows consistent improvements on a range of different toxic comment classification tasks.
Metadata
Supervisors: | Zhang, Ziqi and Hopfgartner, Frank |
---|---|
Keywords: | machine learning; hate speech; natural language processing; deep learning; neural networks; text processing; toxic comments; abusive language; model bias; model explanation; rationales; BERT; pre-trained language models; pre-training |
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.871090 |
Depositing User: | Dr ZHIXUE ZHAO |
Date Deposited: | 09 Jan 2023 15:46 |
Last Modified: | 01 Mar 2023 10:54 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:31833 |
Download
Final eThesis - complete (pdf)
Filename: Thesis_Cass_final.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.