A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments

The field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to alm...

Full description

Bibliographic Details
Main Author:	Casserfelt, Karl
Format:	Others
Language:	English
Published:	Malmö universitet, Fakulteten för teknik och samhälle (TS) 2018
Subjects:	scene recognition deep learning computer vision activity recognition office activity neural networks 3DCNN Engineering and Technology Teknik och teknologier
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20429

id	ndltd-UPSALLA1-oai-DiVA.org-mau-20429
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-mau-204292020-10-28T05:38:24ZA Deep Learning Approach to Video Processing for Scene Recognition in Smart Office EnvironmentsengCasserfelt, KarlMalmö universitet, Fakulteten för teknik och samhälle (TS)Malmö universitet/Teknik och samhälle2018scene recognitiondeep learningcomputer visionactivity recognitionoffice activityneural networks3DCNNEngineering and TechnologyTeknik och teknologierThe field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to almost solve the problem of object recognition in image data. One of the next big challenges in computer vision is to allow computers to not only recognize objects, but also activities. This study is an exploration of the capabilities of deep learning for the specific problem area of activity recognition in office environments. The study used a re-labeled subset of the AMI Meeting Corpus video data set to comparatively evaluate different neural network models performance in the given problem area, and then evaluated the best performing model on a new novel data set of office activities captured in a research lab in Malmö University. The results showed that the best performing model was a 3D convolutional neural network (3DCNN) with temporal information in the third dimension, however a recurrent convolutional network (RCNN) using a pre-trained VGG16 model to extract features and put into a recurrent neural network with a unidirectional Long-Short-Term-Memory (LSTM) layer performed almost as well with the right configuration. An analysis of the results suggests that a 3DCNN's performance is dependent on the camera angle, specifically how well movement is spatially distributed between people in frame. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20429Local 26198application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	scene recognition deep learning computer vision activity recognition office activity neural networks 3DCNN Engineering and Technology Teknik och teknologier
spellingShingle	scene recognition deep learning computer vision activity recognition office activity neural networks 3DCNN Engineering and Technology Teknik och teknologier Casserfelt, Karl A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
description	The field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to almost solve the problem of object recognition in image data. One of the next big challenges in computer vision is to allow computers to not only recognize objects, but also activities. This study is an exploration of the capabilities of deep learning for the specific problem area of activity recognition in office environments. The study used a re-labeled subset of the AMI Meeting Corpus video data set to comparatively evaluate different neural network models performance in the given problem area, and then evaluated the best performing model on a new novel data set of office activities captured in a research lab in Malmö University. The results showed that the best performing model was a 3D convolutional neural network (3DCNN) with temporal information in the third dimension, however a recurrent convolutional network (RCNN) using a pre-trained VGG16 model to extract features and put into a recurrent neural network with a unidirectional Long-Short-Term-Memory (LSTM) layer performed almost as well with the right configuration. An analysis of the results suggests that a 3DCNN's performance is dependent on the camera angle, specifically how well movement is spatially distributed between people in frame.
author	Casserfelt, Karl
author_facet	Casserfelt, Karl
author_sort	Casserfelt, Karl
title	A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
title_short	A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
title_full	A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
title_fullStr	A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
title_full_unstemmed	A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
title_sort	deep learning approach to video processing for scene recognition in smart office environments
publisher	Malmö universitet, Fakulteten för teknik och samhälle (TS)
publishDate	2018
url	http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20429
work_keys_str_mv	AT casserfeltkarl adeeplearningapproachtovideoprocessingforscenerecognitioninsmartofficeenvironments AT casserfeltkarl deeplearningapproachtovideoprocessingforscenerecognitioninsmartofficeenvironments
_version_	1719353760234340352

A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments

Similar Items