Entity Type Modeling for Multi-Document Summarization: Generating Descriptive Summaries of Geo-Located Entities

Abstract

In this work we investigate the application of entity type models in extractive multi-document
summarization using the automatic caption generation for images of geo-located entities (e.g.
Westminster Abbey, Loch Ness, Eiffel Tower) as an application scenario. Entity type models
contain sets of patterns aiming to capture the ways the geo-located entities are described in natural
language. They are automatically derived from texts about geo-located entities of the same
type (e.g. churches, lakes, towers). We collect texts about geo-located entities from Wikipedia
because our investigation show that the information humans associate with entity types positively
correlates with the information contained in Wikipedia articles about the same entity
types.

We integrate entity type models into a multi-document summarizer and use them to address the
two major tasks in extractive multi-document summarization: sentence scoring and summary
composition. We experiment with three different representation methods for entity type models:
signature words, n-gram language models and dependency patterns. We first propose that
entity type models will improve sentence scoring, i.e. they will help to assign higher scores to
sentences which are more relevant to the output summary than to those which are not. Secondly,
we claim that summary composition can be improved using entity type models.

We follow two different approaches to integrate the entity type models into our multi-document
summarizer. In the first approach we use the entity type models in combination with existing
standard summarization features to score the sentences. We also manually categorize the set
of patterns by the information types they describe and use them to reduce redundancy and to
produce better flow within the summary. The second approach aims to eliminate the need for
manual intervention and to fully automate the process of summary generation. As in the first
approach the sentences are scored using standard summarization features and entity type models.

However, unlike the first approach we fully automate the process of summary composition by
simultaneously addressing the redundancy and flow aspects of the summary.

We evaluate the summarizer with integrated entity type models relative to (1) a summarizer using
standard text related features commonly used in summarization and (2) the Wikipedia location
descriptions. The latter constitute a strong baseline for automated summaries to be evaluated
against. The automated summaries are evaluated against human reference summaries using
ROUGE and human readability evaluation, as is a common practice in automatic summarization.

Our results show that entity type models significantly improve the quality of output summaries
over that of summaries generated using standard summarization features andWikipedia baseline
summaries. The representation of entity type models using dependency patterns is superior to
the representations using signature words and n-gram language models.

Metadata

Supervisors:	Gaizauskas, Robert
Awarding institution:	University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield)
Identification Number/EthosID:	uk.bl.ethos.595243
Depositing User:	Dr Ahmet Aker
Date Deposited:	26 Feb 2014 15:33
Last Modified:	03 Oct 2016 11:04
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:5138

Download

thesisSubmittedVersion

Filename: thesisSubmittedVersion.pdf

Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License

CLICK TO DOWNLOAD

[thumbnail of thesisSubmittedVersion.pdf]

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Entity Type Modeling for Multi-Document Summarization: Generating Descriptive Summaries of Geo-Located Entities

Abstract

Metadata

Download

thesisSubmittedVersion

Export

Statistics