Khan, Muhammad Usman Ghani (2012) Natural Language Descriptions for Video Streams. PhD thesis, University of Sheffield.
Abstract
This thesis is concerned with the automatic generation of natural language descriptions
that can be used for video indexing, retrieval and summarization applications.
It is a step ahead of keyword based tagging as it captures relations between keywords
associated with videos, thus clarifying the context between them. Initially,
we prepare hand annotations consisting of descriptions for video segments crafted
from a TREC Video dataset. Analysis of this data presents insights into humans
interests on video contents. For machine generated descriptions, conventional image
processing techniques are applied to extract high level features (HLFs) from
individual video frames. Natural language description is then produced based on
these HLFs. Although feature extraction processes are erroneous at various levels,
approaches are explored to put them together for producing coherent descriptions.
For scalability purpose, application of framework to several different video genres
is also discussed. For complete video sequences, a scheme to generate coherent and
compact descriptions for video streams is presented which makes use of spatial and
temporal relations between HLFs and individual frames respectively. Calculating
overlap between machine generated and human annotated descriptions concludes
that machine generated descriptions capture context information and are in accordance
with human’s watching capabilities. Further, a task based evaluation shows
improvement in video identification task as compared to keywords alone. Finally,
application of generated natural language descriptions, for video scene classification
is discussed.
Metadata
Supervisors: | Gotoh, Yoshihiko |
---|---|
Keywords: | Video processing, image processing, natural language generation. |
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.557592 |
Depositing User: | Mr Muhammad Usman Ghani Khan |
Date Deposited: | 20 Sep 2012 14:44 |
Last Modified: | 27 Apr 2016 13:34 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:2789 |
Download
Khan,_M
Filename: Khan,_M.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.