Cansdale, Annabel (2020) CLUSTard: An automated pipeline for metagenomic clustering using read abundance over time. MSc by research thesis, University of York.
Abstract
Metagenomics, the study of genetic material generated from culture-independent shotgun sequencing of environmental samples, facilitates the investigation of environmental communities as a whole. However, in order to determine biological context from a metagenomic assembly it is necessary to group sequences to allow for the study of the community dynamics and individual organisms. This process, known as metagenomic binning, is accomplished by utilising an aspect of the sequence’s composition. There is little metagenomic binning software available that utilises the change in a community over time in order to cluster metagenomes. Here, I present CLUSTard, an automated pipeline that accomplishes metagenomic binning by utilising sequence abundance values over time, this pipeline clusters large datasets efficiently and requires minimal user input or installation. CLUSTard enabled the resolution of a previously undefined metagenomic dataset. The pipeline allowed for reproducible analysis vastly reducing the time and effort required. I found that the most important factor impacting the CLUSTard’s success of clustering was the quality of the input assembly, with a highly contiguous Nanopore assembly polished by Illumina sequences producing the best clustering result. The results demonstrate that the use of abundance information enables efficient and accurate clustering and also highlights the importance of a reproducible analysis pipeline. I anticipate this pipeline to be beneficial for those who want to produce metagenomic clusters using time-series data and to provide a starting point for further analysis.
Metadata
Supervisors: | Chong, James |
---|---|
Awarding institution: | University of York |
Academic Units: | The University of York > Biology (York) |
Depositing User: | Miss Annabel Cansdale |
Date Deposited: | 28 Jun 2021 08:54 |
Last Modified: | 29 Jun 2021 00:18 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:26371 |
Download
Examined Thesis (PDF)
Filename: Cansdale_201023540_CorrectedThesisClean.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.