White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Zero-shot Image Classification

Long, Yang (2017) Zero-shot Image Classification. PhD thesis, University of Sheffield.

Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (31Mb) | Preview


Image classification is one of the essential tasks for the intelligent visual system. Conventional image classification techniques rely on a large number of labelled images for supervised learning, which requires expensive human annotations. Towards real intelligent systems, a more favourable way is to teach the machine how to make classification using prior knowledge like humans. For example, a palaeontologist could recognise an extinct species purely based on the textual descriptions. To this end, Zero-Shot Image Classification (ZIC) is proposed, which aims to make machines that can learn to classify unseen images like humans. The problem can be viewed from two different levels. Low-level technical issues are concerned by the general Zero-shot Learning (ZSL) problem which considers how to train a classifier on the unseen visual domain using prior knowledge. High-level issues incorporate how to design and organise visual knowledge representation to construct a systematic ontology that could be an ultimate knowledge base for machines to learn. This thesis aims to provide a thorough study of the ZIC problem, regarding models, challenges, possible applications, etc. Besides, each main chapter demonstrates an innovative contribution that is creatively made during my study. The first is to solve the problem of Visual-Semantic Ambiguity. Namely, the same semantic concepts (e.g. attributes) can refer to a huge variety of visual features, and vice versa. Conventional ZSL methods usually adopt a one-way embedding that maps such high-variance visual features into the semantic space, which may lead to degraded performance. As a solution, a dual-graph regularised embedding algorithm named Visual-Semantic Ambiguity Removal (VSAR) is proposed, which can capture the intrinsic local structure of both visual and semantic spaces. In the intermediate embedding space, the structural difference is reconciled to remove the ambiguity. The second contribution aims to circumvent costly visual data collection for conventional supervised classification using ZSL techniques. The key idea is to synthesise visual features from the semantic information, just like humans can imagine features of an unseen class from the semantic description of prior knowledge. Hereafter, new objects from unseen classes can be classified in a conventional supervised framework using the inferred visual features. To overcome the correlation problem, we propose an intermediate Orthogonal Semantic-Visual Embedding (OSVE) space to remove the correlated redundancy. The proposed method achieves promising performance on fine-grained datasets. In the third contribution, the graph constraint of VSAR is incorporated to synthesise improved visual features. The orthogonal embedding is reconsidered as an Information Diffusion problem. Through an orthogonal rotation, the synthesised visual features become more discriminative. On four benchmarks, the proposed method demonstrates the advantages of synthesised visual features, which significantly outperforms state-of-the-art results. Since most of ZSL approaches highly rely on expensive attributes, the fourth contribution of this thesis explores a more feasible but more effective Semantic Simile model to describe unseen classes. From a group of similes, e.g. an unknown animal has the same parts of a wolf, and the colour looks like a bobcat, implicit attributes are discovered by a graph-cut algorithm. Comprehensive experimental results suggest the simile-based implicit attributes can significantly boost the performance. To maximumly reduce the cost of building ontologies for ZIC, the final chapter introduces a novel scheme, using which ZIC can be achieved by only a few similes of each unseen class. No annotations of seen classes are needed. Such an approach finally sets ZIC attribute-free, which significantly improve the feasibility of ZIC. Unseen classes can be recognised using a conventional setting without expensive attribute ontology. It can be concluded that the methods introduced in this thesis provide fundamental components of a zero-shot image classification system. The thesis also points out four core directions for future ZIC research.

Item Type: Thesis (PhD)
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Electronic and Electrical Engineering (Sheffield)
Identification Number/EthosID: uk.bl.ethos.727304
Depositing User: Mr Yang Long
Date Deposited: 20 Nov 2017 09:07
Last Modified: 12 Oct 2018 09:47
URI: http://etheses.whiterose.ac.uk/id/eprint/18613

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)