AlMutair, Lena Saud ORCID: https://orcid.org/0000-0003-0416-040X
(2025)
Enriching Electronic Health Records with Semantic Features: Leveraging Multi-Modality Embeddings.
PhD thesis, University of Leeds.
Abstract
The exponential growth of Electronic Health Records (EHRs) offers rich clinical narratives for
healthcare improvement, but their unstructured nature makes information retrieval challenging.
While models like BERT excel in classification tasks, their potential for EHR-based retrieval—
especially in combining structured and unstructured data—remains underutilized.
We introduce ClinicalNarr, a transformer-based model for clinical information retrieval. Our
initial research examined four language representation settings, finding that concept-only em-
beddings achieve superior performance (BERTScore F1: 0.699) by emphasizing key medical
concepts while filtering noise. Building on these insights, ClinicalNarr leverages MIMIC-III
data, combining structured ICD codes with unstructured narratives to improve retrieval. When
tested on MedNLI, it achieved 90.5% accuracy, outperforming previous models, with strong cor-
relations with physician judgments (r=0.69) and medical coders (r=0.76).
Comparative analysis reveals HNSW’s superiority over IVF for semantic retrieval (NDCG@10:
0.49 vs. 0.43). ClinicalNarr outperformed existing models, achieving three times better hit
rate. This advantage expanded with our novel ontology-augmented evaluation methodology,
which increased performance by 13%—reaching a 52% Hit Rate compared to competing models
(15-20%).
This study establishes a benchmark for clinical information retrieval using MIMIC-III, sig-
nificantly advancing clinical decision support capabilities by improving semantic alignment in
healthcare information systems
Metadata
Supervisors: | Atwell, Eric and Ravikumar, Nishant |
---|---|
Related URLs: | |
Keywords: | Clinical NLP, Clinical Information Retrieval, Clinical Narrative, Unstructured Text, EHR, Deep Learning, Transformers, BERT, Semantic Embeddings, MIMIC-III Dataset, Ontology Integration, Multimodal Learning, large language models. |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds) |
Date Deposited: | 10 Oct 2025 09:39 |
Last Modified: | 10 Oct 2025 09:39 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:37420 |
Download
Final eThesis - complete (pdf)
Embargoed until: 1 October 2030
Please use the button below to request a copy.
Filename: Enriching Electronic Health Records with Semantic Features Leveraging Multi-Modality Embeddings.pdf

Export
Statistics
Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.