Hardness-aware learning: taxonomy, benchmarking, and fairness applications

Abstract

Class bias—systematic recall gaps across classes—continues to limit the fairness of classification models. Most explanations attribute this bias to data imbalance, assuming that unequal sample counts are its primary cause. However, this interpretation overlooks differences in class hardness that persist even in balanced datasets. Since prior studies rarely separate these effects, it remains unclear whether reported fairness gains arise from one or the other, constraining advances in hardness-aware learning. The focus of this thesis is twofold: bringing coherence to a fragmented field and showcasing its practical value.

To systematize existing approaches, we first review model- and data-based hardness estimators, which have evolved as largely independent research lines. We introduce a unified taxonomy that organizes both families and the types of hard samples they identify, providing a conceptual foundation for hardness-aware research. Building on this, we perform the first comparative benchmark between these two families, highlighting key limitations of data-based estimation. A complementary stability analysis further shows that single-seed evaluations—common in prior work—produce unreliable hardness rankings, motivating ensemble-based estimation.

To demonstrate why hardness-aware learning matters, we conduct two case studies assessing whether hardness-based resampling can improve fairness in balanced datasets. We find that it consistently reduces class-level recall and F1 gaps and can also enhance overall performance under suitable conditions. These outcomes depend on three key factors: the accuracy of hardness estimation, the quality of oversampled data, and the degree of induced data imbalance. Together, these findings establish hardness imbalance as a distinct and measurable phenomenon and position hardness-aware learning as a principled framework for studying and mitigating it.

Metadata

Supervisors:	Lu, Haiping
Keywords:	hardness imbalance, hardness-aware learning, resampling, data imbalance, fairness, computer vision, hardness, data pruning
Awarding institution:	University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield)
Date Deposited:	07 Apr 2026 08:30
Last Modified:	07 Apr 2026 08:30
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:38512

Download

Final eThesis - complete (pdf)

Filename: PhD_Thesis.pdf

Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License

CLICK TO DOWNLOAD

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Hardness-aware learning: taxonomy, benchmarking, and fairness applications

Abstract

Metadata

Download

Final eThesis - complete (pdf)

Export

Statistics