Bin-Hezam, Reem (2026) Improving Stopping Methods for Technology Assisted Review. PhD thesis, University of Sheffield.
Abstract
Technology Assisted Review (TAR) aims to reduce the effort required to review large collections of documents for relevance. Common applications include systematic reviews in the medical domain and eDiscovery in the legal sector. TAR usually involves ranking documents for reviewing, and while most relevant documents are prioritised early in the ranking, deciding when to stop the reviewing process remains one of the TAR challenges. This thesis introduces multiple TAR stopping approaches that aim to achieve a given target recall while examining as few documents as possible. The first stopping approach is based on point processes, which are statistical models used to represent random events. The approach uses rate functions to model the occurrence of relevant documents over a ranking and hence estimate the total number of relevant documents to indicate a stopping point. Using two point processes (Inhomogeneous Poisson and Cox), the effect of multiple rate functions has been explored, including hyperbolic decline, the first to be used in TAR. The second approach is a novel stopping method based on reinforcement learning, which trains an agent to explore an environment represented by the ranking and maximises a reward function for a given target recall. Additionally, the approach has been generalised to be trained on multiple target recall levels simultaneously, and its reward function has been enhanced to be adaptable for different stopping objectives, such as ensuring reaching the target or minimising cost. For both stopping approaches, the efficacy of integrating text classifiers’ predictions of unexamined documents has also been explored. Furthermore, stopping effectiveness over different ranking qualities has been introduced, emphasising their effect on stopping. Overall, analysis has been performed on stopping at multiple target recall levels on different datasets, showing how the proposed approaches are effective, outperforming many existing baselines.
Metadata
| Supervisors: | Stevenson, Mark |
|---|---|
| Awarding institution: | University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) |
| Date Deposited: | 23 Mar 2026 11:25 |
| Last Modified: | 23 Mar 2026 11:25 |
| Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:38442 |
Download
Final eThesis - complete (pdf)
Embargoed until: 23 March 2027
Please use the button below to request a copy.
Filename: PhD_Thesis.pdf
Export
Statistics
Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.