Automated Model Building of Nucleic Acids and Carbohydrates Using Experimental Data and Deep Learning Models

Abstract

Understanding the structural information of proteins, nucleic acids, and carbohydrates is fundamental to gaining mechanistic and functional insights into biological processes. X-ray crystallography and, more recently, cryogenic electron microscopy are frequently used methods to study the three-dimensional structures of biological molecules. These techniques do not record three-dimensional atomic positions directly, instead volumetric density data is used to create an atomic model. This process of building models from density data is often time-intensive and requires substantial manual effort, which automated model-building methods aim to alleviate. While automated methods for protein modelling are mature, methods for nucleic acid and carbohydrate modelling often fall short or require manual intervention. The main challenge that automated model-building methods face is identifying probable regions of experimental density associated with a specific atomic group.

The use of convolutional neural networks to identify regions of experimental density corresponding to the phosphate, sugar and base groups of nucleotides was explored. Extensive experimentation with model architectures enabled the training of a single convolutional neural network that precisely identifies regions of experimental density associated with the nucleic acid phosphate, sugar, and base groups in both crystallographic and electron microscopy experimental density data. These predicted regions can then be used as a guide to automatically model nucleic acids into experimental density, with greater completeness than existing methods can provide. This new method was released as a software package called NucleoFind.

The model architecture designed for nucleic acids was also applied to carbohydrates to identify potential glycosylation sites successfully. Using glycosylation geometry data obtained from the Protein Data Bank, these potential sites can then be modelled with a simple method of recursive carbohydrate addition and critical assessment. The resultant method can produce complete models of N, O, and C-linked glycosylation, and lays the groundwork for a future automated carbohydrate model-building method.

Metadata

Supervisors:	Cowtan, Kathryn and Agirre, Jon
Related URLs:	NucleoFind: a deep-learning network for interpreting nucleic acid electron density (Related publication) Analysis and validation of overall N-glycan conformation in Privateer (Related publication) NucleoFind GitHub (Research data)
Keywords:	Automated Model Building, X-ray Crystallography, Machine Learning, Deep Learning, cryo-EM, Nucleic Acids, Carbohydrates, NucleoFind
Awarding institution:	University of York
Academic Units:	The University of York > Chemistry (York)
Date Deposited:	20 Apr 2026 10:13
Last Modified:	20 Apr 2026 10:13
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:38476

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Automated Model Building of Nucleic Acids and Carbohydrates Using Experimental Data and Deep Learning Models

Abstract

Metadata

Download

Examined Thesis (PDF)

Export

Statistics

Automated Model Building of Nucleic Acids and Carbohydrates Using Experimental Data and Deep Learning Models

Abstract

Metadata

Download

Examined Thesis (PDF)

Related datasets

Export

Statistics