White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Novel Algorithm Development for ‘NextGeneration’ Sequencing Data Analysis

Antanaviciute, Agne (2017) Novel Algorithm Development for ‘NextGeneration’ Sequencing Data Analysis. PhD thesis, University of Leeds.

[img] Text (Main Text)
Thesis 2.pdf - Final eThesis - complete (pdf)
Restricted until 1 July 2020.

Request a copy
[img] Other (Supplementary Data)
Appendix B - Supplementary Datasets.xlsx - Final eThesis - complete (pdf)
Restricted until 1 July 2020.

Request a copy
[img] Text (Appendix A)
Appendix A - List of Bash Commands Used.docx - Final eThesis - complete (pdf)
Restricted until 1 July 2020.

Request a copy


In recent years, the decreasing cost of ‘Next generation’ sequencing has spawned numerous applications for interrogating whole genomes and transcriptomes in research, diagnostic and forensic settings. While the innovations in sequencing have been explosive, the development of scalable and robust bioinformatics software and algorithms for the analysis of new types of data generated by these technologies have struggled to keep up. As a result, large volumes of NGS data available in public repositories are severely underutilised, despite providing a rich resource for data mining applications. Indeed, the bottleneck in genome and transcriptome sequencing experiments has shifted from data generation to bioinformatics analysis and interpretation. This thesis focuses on development of novel bioinformatics software to bridge the gap between data availability and interpretation. The work is split between two core topics – computational prioritisation/identification of disease gene variants and identification of RNA N6 -adenosine Methylation from sequencing data. The first chapter briefly discusses the emergence and establishment of NGS technology as a core tool in biology and its current applications and perspectives. Chapter 2 introduces the problem of variant prioritisation in the context of Mendelian disease, where tens of thousands of potential candidates are generated by a typical sequencing experiment. Novel software developed for candidate gene prioritisation is described that utilises data mining of tissue-specific gene expression profiles (Chapter 3). The second part of chapter investigates an alternative approach to candidate variant prioritisation by leveraging functional and phenotypic descriptions of genes and diseases from multiple biomedical domain ontologies (Chapter 4). Chapter 5 discusses N6 AdenosineMethylation, a recently re-discovered posttranscriptional modification of RNA. The core of the chapter describes novel software developed for transcriptome-wide detection of this epitranscriptomic mark from sequencing data. Chapter 6 presents a case study application of the software, reporting the previously uncharacterised RNA methylome of Kaposi’s Sarcoma Herpes Virus. The chapter further discusses a putative novel N6-methyl-adenosine -RNA binding protein and its possible roles in the progression of viral infection.

Item Type: Thesis (PhD)
Keywords: NGS, Gene Prioritisation; Disease Genes;Gene Expression;m6A; KSHV
Academic Units: The University of Leeds > Faculty of Medicine and Health (Leeds) > Institute of Molecular Medicine (LIMM) (Leeds) > Section of Genetics (Leeds)
The University of Leeds > Faculty of Medicine and Health (Leeds)
Depositing User: Miss Agne Antanaviciute
Date Deposited: 21 Jun 2018 11:47
Last Modified: 21 Jun 2018 11:47
URI: http://etheses.whiterose.ac.uk/id/eprint/20734

Please use the 'Request a copy' link(s) above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)