Nooney, Colleen (2016) Statistical Analysis of Coevolution in Protein Structure and in Ecology. PhD thesis, University of Leeds.
Abstract
In this thesis we explore the theory of coevolution. Yip et al. (2008) define coevolution to be the change in one biological object as a result of the change in one or more associated objects. The process of coevolution has been observed at many biological levels; from microscopic to macroscopic. We explore coevolution at the molecular level by studying protein sequences and their corresponding structures to determine how correlated areas of multiple sequence alignments and structures have coevolved. At the species level, we assess how coevolution drives ecological systems of interacting phylogenetic trees.
Determining the three-dimensional structure of proteins is of interest because the structure of a protein is constrained by its function. Proteins carry out vital functions in every cell and are arguably the most important biological molecule found in organisms. Multiple sequence alignments of protein families contain evolutionary information on these functional constraints. In the first part of this thesis, we aim to develop a method to identify correlated mutations within multiple sequence alignments. These correlated positions are used to predict residues that are in close proximity in three-dimensional space. In turn these structural constraints can be used in ab initio protein structure prediction. Currently the most accurate way to determine protein structure is using experimental techniques such as Nuclear Magnetic Resonance (NMR) and X-ray Crystallography. These techniques are expensive and take time. As a result, the proteins that are chosen to have their structures determined may be subject to selection bias. Initially, we focus on a preliminary analysis of the trypsin protein family. We align trypsin structures from a variety of species using a multiple structural alignment algorithm, to determine how the structure of the family has evolved. Basic summary statistics of the aligned distance matrices reveal a set of residues where the distance between these specific residues and every other residue in
the structure is highly conserved across all of the structures in the protein family. We label these residues as ‘anchor residues’ because they appear to hold the structure of the trypsin protein family in place like anchors.
Following this, we develop a regularised logistic regression model to detect correlated mutations in multiple sequence alignments. We successfully apply our method to a number of small artificial test alignments. When applied to real Pfam datasets, our method has varying success at identifying coevolving columns that are close in physical proximity.
In the second part of this thesis we develop a new method to test efficiently for cospeciation in multitrophic ecological systems. Our method can be applied to bitrophic and tritrophic systems, with the potential to generalise to higher order systems and networks. We utilise methods from electrical circuit theory to reduce higher order systems into two vectors of electrically equivalent patristic distances that can be compared using Spearman’s rank correlation coefficient. Compared to existing methods, our method has equal or higher performance at both trophic levels.
To test our method, interacting systems of phylogenetic trees were simulated by generating random trees, and separately, their interaction matrices. Simulating the systems in this way does not take into account how the systems might have evolved. We propose a more realistic simulation method that evolves over time. The algorithm starts with one species per lineage, that are assumed to have an ecological interaction. The joint evolution of these species is simulated by sampling the time at which evolutionary events occur from an exponential distribution. We explore speciation events, and gaining and losing ecological interactions. Each of these events are controlled by rate parameters. By experimenting with these parameters, a wide range of systems with different cospeciation properties can be simulated. We show that a wide range of systems that can be produced using our method.
Metadata
Supervisors: | Gilks, Walter R and Barber, Stuart and Gusnanto, Arief |
---|---|
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) > School of Mathematics (Leeds) > Statistics (Leeds) |
Identification Number/EthosID: | uk.bl.ethos.703363 |
Depositing User: | Dr C Nooney |
Date Deposited: | 20 Feb 2017 11:37 |
Last Modified: | 25 Jul 2018 09:54 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:16337 |
Download
Final eThesis - complete (pdf)
Filename: Nooney_C_Mathematics_PhD_2016.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.