Learning to Rank and Order Answers to Definition Questions

Abstract

The task of ordering a set of ranked result returned by an online search engine or an offline information retrieval engine is termed as reranking. It is called reranking for the reason that the candidate answer snippets are extracted by the information retrieval systems using some strategy for scoring, for example, based on occurrence of query words. We therefore assume the results to be already ranked and therefore the subsequent ranking is termed as reranking. Ranking drastically reduces the number of documents that will be processed further. Reranking usually involves deeper linguistic analysis and use of expert knowledge resources to get an even better understanding. The first task this thesis explores is regarding reranking of answers to definition questions. The answers are sentences returned by the google search engine in response to the definition questions. This step is relevant to definition questions because the questions tend to be short and therefore the information need of the user is difficult to assess. This means the final result is not a single piece of information but a ordered set of relevant sentences. In this thesis we explore two approaches to reranking that uses dependency tree statistics in a probabilistic setting. One of them is based on calculating edit distance between trees and tree statistics from the corpus and other one uses a tree kernel function and involves using the output from trained classifiers directly.

The second task this thesis explores is the task of sentence ordering for definition questions. The reranking part of the definition question answering pipeline is able to identify the sentences that are relevant to a given question. However, answer to a definition question is a collection of sentences that has some coherent ordering between them. In a way this is not far away from the characteristics observed in a good summary. We believe that by moving sentences around to form a more coherent chunk we will be able to better meet the expectation of a user by improving his reading experience. We present an approach that finds an ordering for the sentences based on the knowledge extracted from observing the order of sentences in Wikipedia articles. Due to the popularity and acceptability of Wikipedia, proven by the fact that wikipedia results are ranked high by all major commercial search engines, it was chosen as the standard to be learnt from and compared against. We present a framework that uses the order of sentences extracted from Wikipedia articles to construct a single big graph of connected sentences. As a mechanism to select a node in the graph, we define a scoring function based on the relative position of candidate sentences.

Metadata

Supervisors:	Manandhar, Suresh
Awarding institution:	University of York
Academic Units:	The University of York > Computer Science (York)
Identification Number/EthosID:	uk.bl.ethos.589155
Depositing User:	Mr. Shailesh Pandey
Date Deposited:	10 Feb 2014 10:34
Last Modified:	08 Sep 2016 13:29
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:5040

Download