Pukowski, Pawel
ORCID: https://orcid.org/0009-0006-1307-2160
(2026)
Hardness-aware learning: taxonomy, benchmarking, and fairness applications.
PhD thesis, University of Sheffield.
Abstract
Class bias—systematic recall gaps across classes—continues to limit the fairness of classification models. Most explanations attribute this bias to data imbalance, assuming that unequal sample counts are its primary cause. However, this interpretation overlooks differences in class hardness that persist even in balanced datasets. Since prior studies rarely separate these effects, it remains unclear whether reported fairness gains arise from one or the other, constraining advances in hardness-aware learning. The focus of this thesis is twofold: bringing coherence to a fragmented field and showcasing its practical value.
To systematize existing approaches, we first review model- and data-based hardness estimators, which have evolved as largely independent research lines. We introduce a unified taxonomy that organizes both families and the types of hard samples they identify, providing a conceptual foundation for hardness-aware research. Building on this, we perform the first comparative benchmark between these two families, highlighting key limitations of data-based estimation. A complementary stability analysis further shows that single-seed evaluations—common in prior work—produce unreliable hardness rankings, motivating ensemble-based estimation.
To demonstrate why hardness-aware learning matters, we conduct two case studies assessing whether hardness-based resampling can improve fairness in balanced datasets. We find that it consistently reduces class-level recall and F1 gaps and can also enhance overall performance under suitable conditions. These outcomes depend on three key factors: the accuracy of hardness estimation, the quality of oversampled data, and the degree of induced data imbalance. Together, these findings establish hardness imbalance as a distinct and measurable phenomenon and position hardness-aware learning as a principled framework for studying and mitigating it.
Metadata
| Supervisors: | Lu, Haiping |
|---|---|
| Keywords: | hardness imbalance, hardness-aware learning, resampling, data imbalance, fairness, computer vision, hardness, data pruning |
| Awarding institution: | University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) |
| Date Deposited: | 07 Apr 2026 08:30 |
| Last Modified: | 07 Apr 2026 08:30 |
| Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:38512 |
Download
Final eThesis - complete (pdf)
Filename: PhD_Thesis.pdf
Licence:

This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.