Test Suite Health: Automatically Improving the Reliability and Effectiveness of Test Suites

Abstract

Software systems are inherently prone to faults, stemming from human errors and incorrect assumptions made during the development process. Additionally, due to the evolving nature of codebases, even a correct code behaviour can degrade, becoming faulty. To mitigate these issues, unit testing is widely adopted, providing developers with a systematic way to exercise the system and verify individual units. Typically, developers write tests that exercise the production code, with the goal of achieving a certain level of code coverage and fault detection capabilities. However, writing tests that only achieve high code coverage and mutation scores alone is not enough; tests must also be reliable. Advancements in automated test generation techniques have decreased the burden of developers. However, developers remain primarily responsible for writing reliable and effective test suites, as these tools often fall short in areas requiring human expertise. Inadequate developer-written tests—whether due to low fault detection capability, flakiness, brittleness, or lack of realism—not only increase the maintenance overhead of the software systems, but also undermine the reliability of the test suite. This thesis takes a holistic approach to assessing and improving the quality of unit test suites, focusing on its reliability and effectiveness. To address these challenges, I set two primary objectives for this thesis: (1) understanding the factors that limit the reliability and effectiveness of test suites, and (2) developing and empirically evaluating automated techniques to improve and repair existing test suites. Firstly, to ensure that the state-of-the-art automated test generation tool, EvoSuite, could be utilised to enhance existing developer-written tests, I conducted an empirical study to investigate the issue of flakiness in tests generated by such tools. Following this, I evaluated the effectiveness of search-based test generation of EvoSuite capabilities to amplify existing developer-written test suites’ mutation score. To gain insights into software developer’s perspectives on test brittleness, I then conducted a developer survey of 73 professional software developers, evaluated 60 StackOverflow threads, and empirically evaluated 4,801 open-source projects. This is then followed by utilising EvoSuite’s search-based test generation technique to replace tests that make calls directly to implementation details with more reliable alternatives. These efforts contribute to understanding and improving four main indicators of test suite quality: fault detection capabilities, test flakiness, test brittleness, and test “realism”.

Metadata

Supervisors:	McMinn, Phil
Keywords:	software engineering, software testing, brittle tests, flaky tests, mutation testing
Awarding institution:	University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield)
Depositing User:	Muhammad Firhard Roslan
Date Deposited:	08 Apr 2025 08:29
Last Modified:	08 Apr 2025 08:29
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:36618

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Test Suite Health: Automatically Improving the Reliability and Effectiveness of Test Suites

Abstract

Metadata

Download

Final eThesis - complete (pdf)

Export

Statistics