Dialpuri, Jordan
ORCID: 0000-0002-6205-2661
(2026)
Automated Model Building of Nucleic Acids and Carbohydrates Using Experimental Data and Deep Learning Models.
PhD thesis, University of York.
Abstract
Understanding the structural information of proteins, nucleic acids, and carbohydrates is fundamental to gaining mechanistic and functional insights into biological processes. X-ray crystallography and, more recently, cryogenic electron microscopy are frequently used methods to study the three-dimensional structures of biological molecules. These techniques do not record three-dimensional atomic positions directly, instead volumetric density data is used to create an atomic model. This process of building models from density data is often time-intensive and requires substantial manual effort, which automated model-building methods aim to alleviate. While automated methods for protein modelling are mature, methods for nucleic acid and carbohydrate modelling often fall short or require manual intervention. The main challenge that automated model-building methods face is identifying probable regions of experimental density associated with a specific atomic group.
The use of convolutional neural networks to identify regions of experimental density corresponding to the phosphate, sugar and base groups of nucleotides was explored. Extensive experimentation with model architectures enabled the training of a single convolutional neural network that precisely identifies regions of experimental density associated with the nucleic acid phosphate, sugar, and base groups in both crystallographic and electron microscopy experimental density data. These predicted regions can then be used as a guide to automatically model nucleic acids into experimental density, with greater completeness than existing methods can provide. This new method was released as a software package called NucleoFind.
The model architecture designed for nucleic acids was also applied to carbohydrates to identify potential glycosylation sites successfully. Using glycosylation geometry data obtained from the Protein Data Bank, these potential sites can then be modelled with a simple method of recursive carbohydrate addition and critical assessment. The resultant method can produce complete models of N, O, and C-linked glycosylation, and lays the groundwork for a future automated carbohydrate model-building method.
Metadata
| Supervisors: | Cowtan, Kathryn and Agirre, Jon |
|---|---|
| Related URLs: | |
| Keywords: | Automated Model Building, X-ray Crystallography, Machine Learning, Deep Learning, cryo-EM, Nucleic Acids, Carbohydrates, NucleoFind |
| Awarding institution: | University of York |
| Academic Units: | The University of York > Chemistry (York) |
| Date Deposited: | 20 Apr 2026 10:13 |
| Last Modified: | 20 Apr 2026 10:13 |
| Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:38476 |
Download
Examined Thesis (PDF)
Filename: PhD Thesis - Jordan Dialpuri.pdf
Licence:

This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Related datasets
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.