Natural language descriptions for video streams

This thesis is concerned with the automatic generation of natural language descriptions that can be used for video indexing, retrieval and summarization applications. It is a step ahead of keyword based tagging as it captures relations between keywords associated with videos, thus clarifying the con...

Full description

Bibliographic Details
Main Author:	Khan, Muhammad Usman Ghani
Other Authors:	Gotoh, Yoshihiko
Published:	University of Sheffield 2012
Subjects:	006.35
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.557592

id	ndltd-bl.uk-oai-ethos.bl.uk-557592
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-5575922017-10-04T03:24:17ZNatural language descriptions for video streamsKhan, Muhammad Usman GhaniGotoh, Yoshihiko2012This thesis is concerned with the automatic generation of natural language descriptions that can be used for video indexing, retrieval and summarization applications. It is a step ahead of keyword based tagging as it captures relations between keywords associated with videos, thus clarifying the context between them. Initially, we prepare hand annotations consisting of descriptions for video segments crafted from a TREC Video dataset. Analysis of this data presents insights into humans interests on video contents. For machine generated descriptions, conventional image processing techniques are applied to extract high level features (HLFs) from individual video frames. Natural language description is then produced based on these HLFs. Although feature extraction processes are erroneous at various levels, approaches are explored to put them together for producing coherent descriptions. For scalability purpose, application of framework to several different video genres is also discussed. For complete video sequences, a scheme to generate coherent and compact descriptions for video streams is presented which makes use of spatial and temporal relations between HLFs and individual frames respectively. Calculating overlap between machine generated and human annotated descriptions concludes that machine generated descriptions capture context information and are in accordance with human's watching capabilities. Further, a task based evaluation shows improvement in video identification task as compared to keywords alone. Finally, application of generated natural language descriptions, for video scene classification is discussed.006.35University of Sheffieldhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.557592http://etheses.whiterose.ac.uk/2789/Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	006.35
spellingShingle	006.35 Khan, Muhammad Usman Ghani Natural language descriptions for video streams
description	This thesis is concerned with the automatic generation of natural language descriptions that can be used for video indexing, retrieval and summarization applications. It is a step ahead of keyword based tagging as it captures relations between keywords associated with videos, thus clarifying the context between them. Initially, we prepare hand annotations consisting of descriptions for video segments crafted from a TREC Video dataset. Analysis of this data presents insights into humans interests on video contents. For machine generated descriptions, conventional image processing techniques are applied to extract high level features (HLFs) from individual video frames. Natural language description is then produced based on these HLFs. Although feature extraction processes are erroneous at various levels, approaches are explored to put them together for producing coherent descriptions. For scalability purpose, application of framework to several different video genres is also discussed. For complete video sequences, a scheme to generate coherent and compact descriptions for video streams is presented which makes use of spatial and temporal relations between HLFs and individual frames respectively. Calculating overlap between machine generated and human annotated descriptions concludes that machine generated descriptions capture context information and are in accordance with human's watching capabilities. Further, a task based evaluation shows improvement in video identification task as compared to keywords alone. Finally, application of generated natural language descriptions, for video scene classification is discussed.
author2	Gotoh, Yoshihiko
author_facet	Gotoh, Yoshihiko Khan, Muhammad Usman Ghani
author	Khan, Muhammad Usman Ghani
author_sort	Khan, Muhammad Usman Ghani
title	Natural language descriptions for video streams
title_short	Natural language descriptions for video streams
title_full	Natural language descriptions for video streams
title_fullStr	Natural language descriptions for video streams
title_full_unstemmed	Natural language descriptions for video streams
title_sort	natural language descriptions for video streams
publisher	University of Sheffield
publishDate	2012
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.557592
work_keys_str_mv	AT khanmuhammadusmanghani naturallanguagedescriptionsforvideostreams
_version_	1718543512688918528

Natural language descriptions for video streams

Similar Items