Wu, Di (2014) Human Action Recognition Using Deep Probabilistic Graphical Models. PhD thesis, University of Sheffield.
Abstract
Building intelligent systems that are capable of representing or extracting high-level representations from high-dimensional sensory data lies at the core of solving many A.I. related tasks. Human action recognition is an important topic in computer vision that lies in high-dimensional space. Its applications include robotics, video surveillance, human-computer interaction, user interface design, and multi-media video retrieval amongst others.
A number of approaches have been proposed to extract representative features from high-dimensional temporal data, most commonly hard wired geometric or bio-inspired shape context features.
This thesis first demonstrates some \emph{ad-hoc} hand-crafted rules for effectively encoding motion features, and later elicits a more generic approach for incorporating structured feature learning and reasoning, \ie deep probabilistic graphical models.
The hierarchial dynamic framework first extracts high level features and then uses the learned representation for estimating emission probability to infer action sequences.
We show that better action recognition can be achieved by replacing gaussian mixture models by Deep Neural Networks that contain many layers of features to predict probability distributions over states of Markov Models. The framework can be easily extended to include an ergodic state to segment and recognise actions simultaneously.
The first part of the thesis focuses on analysis and applications of hand-crafted features for human action representation and classification. We show that the ``hard coded" concept of correlogram can incorporate correlations between time domain sequences and we further investigate multi-modal inputs, \eg depth sensor input and its unique traits for action recognition.
The second part of this thesis focuses on marrying probabilistic graphical models with Deep Neural Networks (both Deep Belief Networks and Deep 3D Convolutional Neural Networks) for structured sequence prediction. The proposed Deep Dynamic Neural Network exhibits its general framework for structured 2D data representation and classification. This inspires us to further investigate for applying various graphical models for time-variant video sequences.
Metadata
Supervisors: | Shao, Ling |
---|---|
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Electronic and Electrical Engineering (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.617199 |
Depositing User: | Mr Di Wu |
Date Deposited: | 08 Aug 2014 15:07 |
Last Modified: | 03 Oct 2016 11:17 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:6603 |
Download
thesis
Filename: thesis.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.