Corpus linguistics and language learning: bootstrapping linguistic knowledge and resources from text

Abstract

This submission for the award of the degree of PhD by published work must: “make a contribution to knowledge in a coherent and related subject area; demonstrate originality and independent critical ability; satisfy the examiners that it is of sufficient merit to qualify for the award of the degree of PhD.” It includes a selection of my work as a Lecturer (and later, Senior Lecturer) at Leeds University,
from 1984 to the present. The overall theme of my research has been bootstrapping linguistic knowledge and resources from text. A persistent strand of interest has been
unsupervised and semi-supervised machine learning of linguistic knowledge from textual sources; the attraction of this approach is that I could start with English, but
go on to apply analogous techniques to other languages, in particular Arabic. This theme covers a broad range of research over more than 20 years at Leeds University
which I have divided into 8 sub-topics: A: Constituent-Likelihood statistical modelling of English grammar; B: Machine Learning of grammatical patterns from a corpus; C: Detecting grammatical errors in English text; D: Evaluation of English grammatical annotation models; E: Machine Learning of semantic language models; F: Applications in English language teaching; G: Arabic corpus linguistics; H:
Applications in Computing teaching and research. The first section builds on my early years as a lecturer at Leeds University, when my research was essentially a progression from my previous work at Lancaster University on the LOB Corpus Part-of-Speech Tagging project (which resulted in the Tagged LOB Corpus, a resource for Corpus Linguistics research still in use today); I investigated a range of
ideas for extending and/or applying techniques related to Part-of-Speech tagging in Corpus Linguistics. The second section covers a range of co-authored papers representing grant-funded research projects in Corpus Linguistics; in this mode of research, I had to come up with the original ideas and guide the project, but much of the detailed implementation was down to research assistant staff. Another highly productive mode of research has been supervision of research students, leading to
further jointly-authored research papers. I helped formulate the research plans, and guided and advised the students; as with research-grant projects, the detailed
implementation of the research has been down to the research students. The third section includes a few of the most significant of these jointly-authored Corpus
Linguistics research papers. A “standard” PhD generally includes a survey of the field to put the work in context; so as a fourth section, I include some survey papers
aimed at introducing new developments in corpus linguistics to a wider audience.

Metadata

Supervisors:	Hogg, D.
Awarding institution:	University of Leeds
Academic Units:	The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds)
Identification Number/EthosID:	uk.bl.ethos.631381
Depositing User:	Repository Administrator
Date Deposited:	05 Dec 2014 12:38
Last Modified:	25 Nov 2015 13:47
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:7504

Download

Final eThesis - complete (pdf)

Filename: CorpusLinguisticsBootstrappingPhD.pdf

CLICK TO DOWNLOAD

[thumbnail of CorpusLinguisticsBootstrappingPhD.pdf]

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Corpus linguistics and language learning: bootstrapping linguistic knowledge and resources from text

Abstract

Metadata

Download

Final eThesis - complete (pdf)

Export

Statistics