White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Interpreting Document Collections with Topic Models

Aletras, Nikolaos (2014) Interpreting Document Collections with Topic Models. PhD thesis, University of Sheffield.

Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (3669Kb) | Preview


This thesis concerns topic models, a set of statistical methods for interpreting the contents of document collections. These models automatically learn sets of topics from words frequently co-occurring in documents. Topics learned often represent abstract thematic subjects, i.e Sports or Politics. Topics are also associated with relevant documents. These characteristics make topic models a useful tool for organising large digital libraries. Hence, these methods have been used to develop browsing systems allowing users to navigate through and identify relevant information in document collections by providing users with sets of topics that contain relevant documents. First, we look at the problem of identifying incoherent topics. We show that our methods work better than previously proposed approaches. Next, we propose novel methods for efficiently identifying semantically related topics which can be used for topic recommendation. Finally, we look at the problem of alternative topic representations to topic keywords. We propose approaches that provide textual or image labels which assist in topic interpretability. We also compare different topic representations within a document browsing system.

Item Type: Thesis (PhD)
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield)
The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield)

The University of Sheffield > Faculty of Engineering (Sheffield)
Identification Number/EthosID: uk.bl.ethos.631458
Depositing User: Mr Nikolaos Aletras
Date Deposited: 12 Dec 2014 15:46
Last Modified: 03 Oct 2016 11:18
URI: http://etheses.whiterose.ac.uk/id/eprint/7484

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)