White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Generic named entity extraction

Jara-Valencia, José Luis (2005) Generic named entity extraction. PhD thesis, University of York.

[img] Text (428396.pdf)
428396.pdf - Examined Thesis (PDF)

Download (25Mb)


This thesis proposes and evaluates different ways of performing generic named entity recognition, that is the construction of a system capable of recognising names in free text which is not specific to any particular domain or task. The starting point is an implementation of a well known baseline system which is based on maximum entropy models that utilise lexically-oriented features to recognised names in text. Although this system achieves good levels of performance, both maximum entropy models and lexically-oriented features have their limitations. Three alternative ways in which this system can be extended to overcome these limitations are then studied: [> more linguistically-oriented features are extracted from a generic lexical source, namely WordNet®, and then added to the pool of features of the maximum entropy model [> the maximum entropy model is bias towards training samples that are similar to the piece of text being analysed [> a bootstrapping procedure is introduced to allow maximum entropy models to collect new, valuable information from unlabelled text Results in this thesis indicate that the maximum entropy model is a very strong approach that accomplishes levels of performance that are very hard to improve on. However, these results also suggest that these extensions of the baseline system could yield improvements, though some difficulties must be addressed and more research is needed to obtain more assertive conclusions. This thesis has nonetheless provided important contributions: a novel approach to estimate the complexity of a named entity extraction task, a method for selecting the features to be used by the maximum entropy model from a large pool of features and a novel procedure to bootstrap maximum entropy models.

Item Type: Thesis (PhD)
Academic Units: The University of York > Computer Science (York)
Identification Number/EthosID: uk.bl.ethos.428396
Depositing User: EThOS Import (York)
Date Deposited: 08 Dec 2016 17:00
Last Modified: 08 Dec 2016 17:00
URI: http://etheses.whiterose.ac.uk/id/eprint/14071

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)