Sridhar, Muralikrishna (2010) Unsupervised learning of event and object classes from video. PhD thesis, University of Leeds.
Available under License Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales.
We present a method for unsupervised learning of event classes from videos in which multiple activities may occur simultaneously. Unsupervised discovery of event classes avoids the need to hand-crafted event classes and thereby makes it possible in principle to scale-up to the huge number of event classes that occur in the real world. Research into an unsupervised approach has important consequences for tasks such as video understanding and summarization, modelling usual and unusual behaviour and video indexing for retrieval. These tasks are becoming increasingly important for scenarios such as surveillance, video search, robotic vision and sports highlights extraction as a consequence of the increasing proliferation of videos. The proposed approach is underpinned by a generative probabilistic model for events and a graphical representation for the qualitative spatial relationships between objects and their temporal evolution. Given a set of tracks for the objects within a scene, a set of event classes is derived from the most likely decomposition of the ‘activity graph’ of spatio-temporal relationships between all pairs of objects into a set of labelled events involving subsets of these objects. The posterior probability of candidate solutions favours decompositions in which events of the same class have a similar relational structure, together with three other measures of well-formedness. A Markov Chain Monte Carlo (MCMC) procedure is used to efficiently search for the MAP solution. This search moves between possible decompositions of the activity graph into sets of unlabelled events and at each move adds a close to optimal labellings (for this decomposition) using spectral clustering. Experiments on simulated and real data show that the discovered event classes are often semantically meaningful and correspond well with ground-truth event classes assigned by hand. Event Learning is followed by learning of functional object categories. Equivalence classes of objects are discovered on the basis of their similar functional role in multiple event instantiations. Objects are represented in a multidimensional space that captures their functional role in all the events. Unsupervised learning in this space results in functional object-categories. Experiments in the domain of aircraft handling suggests that our spatio-temporal representation together with the learning techniques are a promising framework for learning functional object-categories from video.
|Item Type:||Thesis (PhD)|
|Academic Units:||The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds)|
|Depositing User:||Ethos Import|
|Date Deposited:||15 Aug 2012 15:22|
|Last Modified:||08 Aug 2013 08:49|