Unsupervised learning of event and object classes from video

Abstract

We present a method for unsupervised learning of event classes from videos in which multiple activities may occur simultaneously. Unsupervised discovery of event classes
avoids the need to hand-crafted event classes and thereby makes it possible in principle to scale-up to the huge number of event classes that occur in the real world. Research into an unsupervised approach has important consequences for tasks such as video understanding
and summarization, modelling usual and unusual behaviour and video indexing for retrieval. These tasks are becoming increasingly important for scenarios such as surveillance,
video search, robotic vision and sports highlights extraction as a consequence of the increasing proliferation of videos.

The proposed approach is underpinned by a generative probabilistic model for events and a graphical representation for the qualitative spatial relationships between objects and their temporal evolution. Given a set of tracks for the objects within a scene, a set of event classes is derived from the most likely decomposition of the ‘activity graph’ of spatio-temporal relationships between all pairs of objects into a set of labelled events
involving subsets of these objects.

The posterior probability of candidate solutions favours decompositions in which events of the same class have a similar relational structure, together with three other measures of well-formedness. A Markov Chain Monte Carlo (MCMC) procedure is used to efficiently search for the MAP solution. This search moves between possible decompositions
of the activity graph into sets of unlabelled events and at each move adds a close to optimal labellings (for this decomposition) using spectral clustering.

Experiments on simulated and real data show that the discovered event classes are often semantically meaningful and correspond well with ground-truth event classes assigned
by hand.

Event Learning is followed by learning of functional object categories. Equivalence classes of objects are discovered on the basis of their similar functional role in multiple
event instantiations. Objects are represented in a multidimensional space that captures their functional role in all the events. Unsupervised learning in this space results in functional object-categories.

Experiments in the domain of aircraft handling suggests that our spatio-temporal representation together with the learning techniques are a promising framework for learning
functional object-categories from video.

Metadata

Supervisors:	Cohn, Anthony and Hogg, David
ISBN:	978-0-85731-049-1
Awarding institution:	University of Leeds
Academic Units:	The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds)
Identification Number/EthosID:	uk.bl.ethos.557358
Depositing User:	Ethos Import
Date Deposited:	15 Aug 2012 15:22
Last Modified:	24 Aug 2020 06:37
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:2621

Download

Final eThesis - complete (pdf)

Filename: M_Sridhar_PhDThesis_Dec2010.pdf

Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License

CLICK TO DOWNLOAD

[thumbnail of M_Sridhar_PhDThesis_Dec2010.pdf]

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Altmetric

View Altmetric information about this item.

Unsupervised learning of event and object classes from video

Abstract

Metadata

Download

Final eThesis - complete (pdf)

Export

Statistics