Model Interpretability for Natural Language Processing Applications

Abstract

This thesis focuses on model interpretability, an area concerned with under- standing model predictions in Natural Language Processing (NLP) tasks. The increase in adoption of opaque models, such as BERT, leads to an increasing need for explaining their predictions. This is typically performed by extract- ing a sub-set of the input, that is indicative of the true reasoning behind the model’s prediction (i.e. a faithful explanation or rationale).
Whilst there are multiple approaches in literature for extracting explana- tions (e.g. feature attribution methods), some faced criticism about how faith- ful they are. Furthermore, explanation faithfulness also depends on the model employed, where highly parametrised models have been shown to produce less faithful explanations. Previous research has also shown that there is no sin- gle best feature attribution method across models, tasks and even instances of the same dataset, whilst finding a rationale length is still an open problem. Additionally, a limitation of current evaluations for explanation faithfulness, is that they are performed on a held-out dataset coming from the same do- main (i.e. the data they are evaluated on, are from the same distribution as the training data). However, we are not aware how faithfulness is impacted in out-of-domain settings.
The main aim of this thesis therefore, is to improve and evaluate the faith- fulness of explanations in NLP applications. First, we improve the faithfulness of explanations extracted using attention mechanisms, a popular component used in neural NLP models. In a similar direction, we show improvements on the faithfulness of explanations from feature attribution approaches, when us- ing large language models. We then address the problem of specifying a priori a feature scoring method, rationale length and type. Finally, we evaluate the faithfulness of explanations in out-of-domain settings, highlighting a problem when using popular faithfulness evaluation metrics.

Metadata

Supervisors:	Aletras, Nikolaos
Related URLs:	github page of projects (Author)
Keywords:	NLP, interpretability, machine learning
Awarding institution:	University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield)
Identification Number/EthosID:	uk.bl.ethos.863426
Depositing User:	Mr George Chrysostomou
Date Deposited:	18 Oct 2022 08:53
Last Modified:	01 Nov 2022 10:53
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:31611

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Model Interpretability for Natural Language Processing Applications

Abstract

Metadata

Download

Final eThesis - complete (pdf)

Export

Statistics