Ohuoba, Adaeze Ngozi
ORCID: https://orcid.org/0000-0002-9412-7461
(2025)
Predicting and detecting machine translation errors in English-Igbo medical texts through machine learning.
PhD thesis, University of Leeds.
Abstract
This thesis presents a framework for evaluating the quality of English-Igbo machine translation and for predicting and detecting critical machine translation errors in health texts. It progresses from quality evaluation to supervised machine learning tasks for document- and segment-level error prediction and detection in English-Igbo machine translation. The study begins by assessing existing English-Igbo machine translation tools and selecting the best performing tool via an adapted error categorisation and severity metric. This metric also led to the identification of English linguistic features and distribution that cause critical errors in English-Igbo machine translation.
Building on these insights, a supervised machine learning task for document-level critical error prediction in English-Igbo machine translation (IgboMT-EP) was conducted on a moderate sized dataset of health texts. Multiple traditional error-prediction models were trained employing feature-based methods with TF-IDF vectorization and were compared with BERT to select the best-performing model for IgboMT-EP. A robust text pre-processing pipeline was implemented to enhance text representation. The models were tested on natural and upsampled data, with the chosen model showing a better performance with natural data. For the document-level task, BERT’s performance was surpassed by SVM, a traditional model. Additionally, deep learning experiments for segment-level error prediction and detection (IgboMT-ED) were conducted using BERT, alongside SVM as the model of choice in IgboMT-EP. The experiments consisted of two classification heads: one for distinguishing between the presence or absence of a critical machine translation error, and another for categorizing detected critical errors into defined error categories. IgboMT-ED experiments were conducted in both monolingual (English only) and bilingual (English and Igbo) settings to evaluate segment-level error prediction in the monolingual case and critical error detection in the bilingual scenario. For the segment-level experiment, BERT outperformed SVM proving that BERT predicts better on segment-level than on document-level.
Interpretability experiments were also conducted to uncover the models’ biases and encourage trust in the model of choice. The IgboMT-ED interpretability task highlighted the tokenisation challenge in Igbo NLP. Empirical evaluation of all tasks resulted in a 67 per cent accuracy for IgboMT-EP and 75 per cent accuracy for IgboMT-ED. IgboMT-ED also outperformed the GPT model in comparison and compared favourably to human annotation.
This work has thus created a framework for English-Igbo machine translation quality evaluation and document- and segment- level critical error prediction and detection. It has also possibly started a new direction in machine translation quality improvement for not only the English-Igbo language pair but for other equally low-resourced Niger-Congo languages.
Metadata
| Supervisors: | Sharoff, Serge and Walker, Callum |
|---|---|
| Related URLs: | |
| Keywords: | critical errors, machine translation errors, error-prediction, error detection, multi-word expressions, polysemy, machine learning |
| Awarding institution: | University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > School of Languages Cultures and Societies (Leeds) |
| Date Deposited: | 16 Jan 2026 11:01 |
| Last Modified: | 16 Jan 2026 11:01 |
| Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:37794 |
Download
Final eThesis - complete (pdf)
Embargoed until: 1 December 2026
Please use the button below to request a copy.
Filename: Ohuoba_AN_LCS_PhD_2025.pdf
Related datasets
Export
Statistics
Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.