Jara-Valencia, José Luis (2005) Generic named entity extraction. PhD thesis, University of York.
Abstract
This thesis proposes and evaluates different ways of performing generic named entity
recognition, that is the construction of a system capable of recognising names in free
text which is not specific to any particular domain or task.
The starting point is an implementation of a well known baseline system which is based
on maximum entropy models that utilise lexically-oriented features to recognised names
in text. Although this system achieves good levels of performance, both maximum
entropy models and lexically-oriented features have their limitations. Three alternative
ways in which this system can be extended to overcome these limitations are then
studied:
[> more linguistically-oriented features are extracted from a generic lexical source,
namely WordNet®, and then added to the pool of features of the maximum entropy
model
[> the maximum entropy model is bias towards training samples that are similar to
the piece of text being analysed
[> a bootstrapping procedure is introduced to allow maximum entropy models to
collect new, valuable information from unlabelled text
Results in this thesis indicate that the maximum entropy model is a very strong approach
that accomplishes levels of performance that are very hard to improve on. However,
these results also suggest that these extensions of the baseline system could yield improvements,
though some difficulties must be addressed and more research is needed to
obtain more assertive conclusions.
This thesis has nonetheless provided important contributions: a novel approach to
estimate the complexity of a named entity extraction task, a method for selecting the
features to be used by the maximum entropy model from a large pool of features and a
novel procedure to bootstrap maximum entropy models.
Metadata
Awarding institution: | University of York |
---|---|
Academic Units: | The University of York > Computer Science (York) |
Identification Number/EthosID: | uk.bl.ethos.428396 |
Depositing User: | EThOS Import (York) |
Date Deposited: | 08 Dec 2016 17:00 |
Last Modified: | 08 Dec 2016 17:00 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:14071 |
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.