Aksoy, Nurbanu ORCID: https://orcid.org/0000-0002-4043-1521 (2024) Automatic, integrated and structured reporting for radiology image examinations. PhD thesis, University of Leeds.
Abstract
The increasing availability of diverse data sources has expanded the potential for modality translation tasks in artificial intelligence, particularly in converting images into natural language descriptions. In the clinical domain, particularly in chest X-ray (CXR) analysis, advancing Computer-Aided Detection (CAD) and Diagnosis (CADx) technologies hold significant promise for improving patient outcomes and healthcare delivery. This research focuses on improving medical image representations by leveraging clinically relevant information and tasks to establish a more robust pipeline for the automated generation of radiology reports. By aligning with clinical pathways, we aim to generate accurate, contextually relevant reports that reflect real-world medical practices. This study addresses the limitations of current single-modality approaches, which often fail to capture complex relationships and complementary information across different data modalities. The primary objectives of this research are threefold: to develop efficient multi-input pre-processing mechanisms for diverse data types; to establish robust frameworks for modality fusion, combining visual, textual, and clinical data into unified embeddings; and to enhance representation learning capabilities through joint optimisation in multi-task learning.
This thesis proposes novel multi-input multi-stream end-to-end networks demonstrating significant improvements in text generation accuracy and contextual relevance. It also includes comprehensive ablation studies, systematic analyses of different architectures, and the introduction of multi-task learning strategies to optimise feature learning and reduce hallucination in generated reports. The findings highlight the potential benefits of multi-modal and multi-task learning in medical applications, suggesting broader implications for other fields requiring integrated multi-modality. While the research faces challenges such as language variability, metric inadequacy, and computational demands, it presents a versatile framework with potential cross-domain applicability and a roadmap for future developments in multi-modal and cross-modal AI systems.
Metadata
Supervisors: | Ravikumar, Nishant and Sharoff, Serge |
---|---|
Related URLs: | |
Keywords: | report generation, multi-modal learning, data fusion, cross-task learning, deep learning, medical imaging |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds) |
Depositing User: | Mrs Nurbanu Aksoy |
Date Deposited: | 24 Jan 2025 15:57 |
Last Modified: | 24 Jan 2025 15:57 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:35928 |
Download
Final eThesis - complete (pdf)
Filename: Aksoy_N_Computing_PhD_2024.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial ShareAlike 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.