He, Yichen ORCID: https://orcid.org/0000-0003-3464-7526 (2020) Deep learning applications to automate phenotypic measurements on biodiversity datasets. PhD thesis, University of Sheffield.
Abstract
The growing number of digitised biological specimens brings new possibilities for the study of a wide range of evolutionary questions at broad scales. Taking measurements on digital photos often requires annotations (e.g. placing points on focal locations), and many projects label their digitised specimen datasets (commonly more than thousands of images) manually, which could take a huge amount of time. Deep learning is the state-of-the-art for many computer vision tasks. Deep learning models can be trained on a set of manually annotated images, and can make accurate predictions based on what they learned. To what extent deep learning can help to improve the measurement process on digitised collections has yet to be thoroughly explored. Here, I have applied deep learning models to three tasks on two datasets (bird specimens and Littorina shell images), and show that predicted labels are remarkably accurate and that downstream biological analyses using these labels generated biologically meaningful results. First, I used pose estimation models (algorithms originally designed to identify human body parts) to locate keypoints on bird specimens. The results showed high accuracy with 95% of the validation images (N=5,094) correctly predicted, and rapid generation of data (less than three days to predict keypoints on the whole dataset of >120,000 images). Colours measured by points showed that male birds tend to be more colour-diverse than females. Second, I applied deep learning models to segment the overall plumage areas on the bird dataset. More than 95% of the plumage areas were correctly segmented, and it also took less than three days to segment the whole dataset. I found that colour diversities (calculated from segmentations) among closely related birds tend to be similar across more than 7,500 bird species. Finally, I built PhenoLearn, a user friendly tool that provides functions such as manual annotation (to create training images), predicting using deep learning, and reviewing predictions. I illustrate the broader applicability of PhenoLearn in an example of morphological landmarking on Littorina shell images. More than 98% of the predicted landmarks were placed within the acceptable range. The methods and tools introduced here both illustrate the value of deep learning and significantly increases accessibility to deep learning approaches for non-expert biologists allowing the rapid accumulation of phenotypic datasets at large scales. Taken together, these results show that deep learning methods have great potential for speeding up the measuring process on digitised specimens while producing accurate annotations.
Metadata
Supervisors: | Thomas, Gavin and Maddock, Steve |
---|---|
Keywords: | Deep Learning, Machine Learning, colour space, annotations, biodiversity datasets |
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Science (Sheffield) > Animal and Plant Sciences (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.837139 |
Depositing User: | Mr Yichen He |
Date Deposited: | 18 Aug 2021 15:24 |
Last Modified: | 01 Sep 2022 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:28780 |
Download
Final eThesis - complete (pdf)
Filename: PhD_Thesis_Yichen_He.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.