Alfear, Noof ORCID: https://orcid.org/0000-0002-6104-976X
(2024)
Sequence-to-Sequence Automatic Sentence Simplification: Development & Evaluation.
PhD thesis, University of York.
Abstract
Sentence simplification involves the transformation of complex sentences into simpler versions, addressing syntactic and lexical aspects through operations such as rephrasing, addition, deletion, splitting, or substitution. The Sequence-to-Sequence (Seq2Seq) model has demonstrated significant efficacy across various natural language processing tasks, making it a primary focus of our research.
The thesis’s first focus is the evaluation aspect of sentence simplification models. We
are trying to minimize the dependency on human to evaluate the models outcomes. Our
study examines the correlation between existing text simplification evaluation metrics and human judgment. It identifies evaluation metrics that exhibit strong alignment with human assessments, offering insights into metrics suitable for future research.
The second focus of this thesis is different Seq2Seq models with different configurations
from dataset to models architecture. The thesis compares the performance of Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM) units within Seq2Seq models for sentence simplification tasks. It evaluates which recurrent unit type, GRU or LSTM, achieves superior performance in encoder-decoder architectures equipped with attention mechanism. Also, the impact of text embedding techniques on Seq2Seq sentence simplification models is explored. The significance of embedding methods in determining the quality of simplification outcomes is discussed, highlighting their role in model performance. Moreover, the research investigates the potential of fine-tuning pre-trained Large Language Models (LLMs) for simplifying complex sentences. It assesses the integration of LLMs into Seq2Seq models and their adaptation to the simplification task, aiming to leverage the benefits of pre-trained language representations.
The third focus of this thesis is examining the efficacy of fine-tuning domain-specific
LLMs on domain-specific datasets to enhance the quality of simplified sentences. It explores whether domain-specific LLMs outperform general LLMs when tailored and fine-tuned for specific domains, contributing insights into domain-specific sentence simplification.
Through these investigations, the thesis contributes to advancing the understanding and
development of Seq2Seq models for automated sentence simplification, addressing various facets from evaluation metrics to model architecture and fine-tuning strategies.
Metadata
Supervisors: | Kazakov, Dimitar |
---|---|
Related URLs: | |
Keywords: | text simplification, sentence simplification, medical text simplification, Seq2Seq, LLM, Transformer, evaluation, metric, SARI, LENS, Newsela, Wikipedia, SELLS |
Awarding institution: | University of York |
Academic Units: | The University of York > Computer Science (York) |
Depositing User: | Mrs Noof Alfear |
Date Deposited: | 10 Feb 2025 13:57 |
Last Modified: | 10 Feb 2025 13:57 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:36272 |
Download
Examined Thesis (PDF)
Embargoed until: 4 February 2026
Please use the button below to request a copy.
Filename: Alfear_203047472 _PhD_Thesis.pdf

Export
Statistics
Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.