White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Entity Type Modeling for Multi-Document Summarization: Generating Descriptive Summaries of Geo-Located Entities

Aker, Ahmet (2014) Entity Type Modeling for Multi-Document Summarization: Generating Descriptive Summaries of Geo-Located Entities. PhD thesis, University of Sheffield.

[img]
Preview
Text
thesisSubmittedVersion.pdf
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (2410Kb) | Preview

Abstract

In this work we investigate the application of entity type models in extractive multi-document summarization using the automatic caption generation for images of geo-located entities (e.g. Westminster Abbey, Loch Ness, Eiffel Tower) as an application scenario. Entity type models contain sets of patterns aiming to capture the ways the geo-located entities are described in natural language. They are automatically derived from texts about geo-located entities of the same type (e.g. churches, lakes, towers). We collect texts about geo-located entities from Wikipedia because our investigation show that the information humans associate with entity types positively correlates with the information contained in Wikipedia articles about the same entity types. We integrate entity type models into a multi-document summarizer and use them to address the two major tasks in extractive multi-document summarization: sentence scoring and summary composition. We experiment with three different representation methods for entity type models: signature words, n-gram language models and dependency patterns. We first propose that entity type models will improve sentence scoring, i.e. they will help to assign higher scores to sentences which are more relevant to the output summary than to those which are not. Secondly, we claim that summary composition can be improved using entity type models. We follow two different approaches to integrate the entity type models into our multi-document summarizer. In the first approach we use the entity type models in combination with existing standard summarization features to score the sentences. We also manually categorize the set of patterns by the information types they describe and use them to reduce redundancy and to produce better flow within the summary. The second approach aims to eliminate the need for manual intervention and to fully automate the process of summary generation. As in the first approach the sentences are scored using standard summarization features and entity type models. However, unlike the first approach we fully automate the process of summary composition by simultaneously addressing the redundancy and flow aspects of the summary. We evaluate the summarizer with integrated entity type models relative to (1) a summarizer using standard text related features commonly used in summarization and (2) the Wikipedia location descriptions. The latter constitute a strong baseline for automated summaries to be evaluated against. The automated summaries are evaluated against human reference summaries using ROUGE and human readability evaluation, as is a common practice in automatic summarization. Our results show that entity type models significantly improve the quality of output summaries over that of summaries generated using standard summarization features andWikipedia baseline summaries. The representation of entity type models using dependency patterns is superior to the representations using signature words and n-gram language models.

Item Type: Thesis (PhD)
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield)
The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield)
Identification Number/EthosID: uk.bl.ethos.595243
Depositing User: Dr Ahmet Aker
Date Deposited: 26 Feb 2014 15:33
Last Modified: 03 Oct 2016 11:04
URI: http://etheses.whiterose.ac.uk/id/eprint/5138

Actions (repository staff only: login required)