McKay, Conor Ewan
ORCID: 0009-0001-7357-5441
(2025)
Integrating Deep Mutational Scanning in the E. coli Periplasm with Machine Learning to Explore Protein Behaviour.
PhD thesis, University of Leeds.
Abstract
Deep mutational scanning (DMS) leverages advances in DNA synthesis, sequencing, genotype–phenotype linkage, and machine learning. By combining variant libraries, high-throughput selection, and deep sequencing, DMS quantifies the effects of amino-acid substitutions in a single assay without purifying individual variants, yielding datasets that empower artificial intelligence.
This thesis develops the third-generation of the tripartite beta-lactamase assay (TPBLA) as a DMS platform in the E. coli periplasm. TPBLA inserts a protein of interest between beta-lactamase domains and reports antibiotic sensitivity, linking protein feature to E.coli survival. Earlier implementations were either low-throughput but quantitative or directed evolution screens that could not report on less fit variants. The third-generation format scales to thousands of variants while retaining quantitative resolution, recovering both beneficial and deleterious mutations.
Three applications demonstrate the versatility of the assay. Firstly, a single-site saturation library of amyloidogenic Ab42 maps the thermodynamic stability of fibrils in vivo; orthogonal in vitro, in silico, and machine learning-based analyses validate the assay as probing fibril stability and led to the development of ThermAL: A machine learning model for predicting stabilising regions in amyloids of intrinsically disordered peptides. Secondly, analysis of a single-site saturation library of the truncated E.coli outer membrane protein tOmpA resolves biogenesis grammar: membrane-facing strands require hydrophobic, beta-sheet compatible residues; lumen-facing sites contain many immutable residues such as those involved in salt bridges and mortise–tenon pairs; and the C-terminal Ara–X–Ara of the beta-signal is highly constrained. Finally, TPBLA of a combinatorial library of anti–IL-7 scFvs trained DevelopabilityPRED to triage therapeutic candidates; model rankings inversely correlate with hydrophobic interaction chromatography and standup monolayer adsorption chromatography retention times and positively with aggregation onset temperature highlighting its utility as an antibody aggregation predictor.
Collectively, the third-generation TPBLA converts periplasmic selections into genotype–phenotype maps, enabling accurate models and yielding insights into protein folding, aggregation, and biotherapeutic developability.
Metadata
| Supervisors: | Brockwell, David and Radford, Sheena |
|---|---|
| Related URLs: | |
| Awarding institution: | University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Biological Sciences (Leeds) |
| Academic unit: | School of Molecular and Cellular Biology |
| Date Deposited: | 22 Jan 2026 10:04 |
| Last Modified: | 22 Jan 2026 10:04 |
| Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:37912 |
Download
Final eThesis - complete (pdf)
Filename: Thesis (38).pdf
Description: Thesis
Licence:

This work is licensed under a Creative Commons Attribution NonCommercial ShareAlike 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.