Nlebedim, Valentine (2020) Probabilistic Identification of Bacterial Essential Genes Using Transposon-Directed Insertion-Site Sequencing (TraDIS) Data. PhD thesis, University of Sheffield.
Abstract
The development of new technologies and approaches has vastly transformed genomics research, leading to an unrivaled explosion in the amount of genomic data.
The escalation in genomic data generation has collaterally raised the challenges of analyzing these data. The development of the statistical approach and computational methods for the analyses of genomic information are imminent needs at such increasing challenges. In this thesis, we present new statistical methods, to sort through increasingly rich and massive genome-wide TraDIS data sets to classify the essential genes of bacterial organisms. The inferences obtained from most existing approaches are tied to mariner family of transposons known for preferential insertion at TA sites, which makes comparison of results difficult. We first focus on functional derived information by gene to develop methods at gene-level to probabilistically identify bacterial essential genes using transposon-directed insertion sequencing data. One of the approaches proposed at gene-level uses a functional derived variable (insertion density), while the other employs a 2-dimensional variable of insertion density and GC-content. The results of our method (INSDGC) that includes the novelty of incorporating the GC-content, established the efficacy of GC-content in aiding the classification of bacterial genes essentiality as suggested by Seringhaus et al. (2006). This study therefore, suggest the application of GC-content in the classification of genes’ essentiality to ascertain its general applicability. We introduced the novelty of using the relative insertion position proportion within the gene to classify bacterial genes’ essentiality. The methods uses the functional information comprising of a combination of two-dimensional information of the variables (relative insertion position proportion within the gene and read-counts). The results show that the developed methods have implications for the classification of bacterial genes essentiality. However, the performances of two of our methods (RIPPRC1 and RIPPRC2) appear to be sensitive to the high number of insertions due to chimeric/background noise. Wrapping of the distribution of the relative insertion position proportion within the gene to form a circular data yielded a promising result. Our method (vMRIPPRC) which capitalises on the closeness of von Mises distribution to the normal distribution and its association with a shape linked to the maximum entropy for a wrapped/circular data, circumvented the sensitivity problem exhibited by the RIPPRC1 and RIPPRC2 methods. We provided alternative methods (vMRIPP and RIPP) that use single variable at insertion-level. Both methods are capable of classifying genes’ essentiality. However, they require moderate to a high number of insertions for their optimal performance as they could be adversely affected by a low number of insertions. Based on the results of this study, the vMRIPPRC and INSDGC are recommended for the identification of bacterial essential genes using transposon-directed insertion sequencing data. It is expected that the methods developed in this study will facilitate the development of narrower target drugs explicitly directed to the biochemical targets. Hence, it will reduce the likelihood of broad antibiotic resistance development.
Metadata
Supervisors: | Walters, Kevin and Chaudhuri, Roy |
---|---|
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Science (Sheffield) > School of Mathematics and Statistics (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.813870 |
Depositing User: | Valentine Nlebedim |
Date Deposited: | 19 Aug 2020 15:59 |
Last Modified: | 01 Sep 2023 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:27507 |
Download
Final Thesis
Filename: Final Thesis.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.