A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments

The field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to alm...

Full description

Bibliographic Details
Main Author: Casserfelt, Karl
Format: Others
Language:English
Published: Malmö universitet, Fakulteten för teknik och samhälle (TS) 2018
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20429
id ndltd-UPSALLA1-oai-DiVA.org-mau-20429
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-mau-204292020-10-28T05:38:24ZA Deep Learning Approach to Video Processing for Scene Recognition in Smart Office EnvironmentsengCasserfelt, KarlMalmö universitet, Fakulteten för teknik och samhälle (TS)Malmö universitet/Teknik och samhälle2018scene recognitiondeep learningcomputer visionactivity recognitionoffice activityneural networks3DCNNEngineering and TechnologyTeknik och teknologierThe field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to almost solve the problem of object recognition in image data. One of the next big challenges in computer vision is to allow computers to not only recognize objects, but also activities. This study is an exploration of the capabilities of deep learning for the specific problem area of activity recognition in office environments. The study used a re-labeled subset of the AMI Meeting Corpus video data set to comparatively evaluate different neural network models performance in the given problem area, and then evaluated the best performing model on a new novel data set of office activities captured in a research lab in Malmö University. The results showed that the best performing model was a 3D convolutional neural network (3DCNN) with temporal information in the third dimension, however a recurrent convolutional network (RCNN) using a pre-trained VGG16 model to extract features and put into a recurrent neural network with a unidirectional Long-Short-Term-Memory (LSTM) layer performed almost as well with the right configuration. An analysis of the results suggests that a 3DCNN's performance is dependent on the camera angle, specifically how well movement is spatially distributed between people in frame. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20429Local 26198application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic scene recognition
deep learning
computer vision
activity recognition
office activity
neural networks
3DCNN
Engineering and Technology
Teknik och teknologier
spellingShingle scene recognition
deep learning
computer vision
activity recognition
office activity
neural networks
3DCNN
Engineering and Technology
Teknik och teknologier
Casserfelt, Karl
A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
description The field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to almost solve the problem of object recognition in image data. One of the next big challenges in computer vision is to allow computers to not only recognize objects, but also activities. This study is an exploration of the capabilities of deep learning for the specific problem area of activity recognition in office environments. The study used a re-labeled subset of the AMI Meeting Corpus video data set to comparatively evaluate different neural network models performance in the given problem area, and then evaluated the best performing model on a new novel data set of office activities captured in a research lab in Malmö University. The results showed that the best performing model was a 3D convolutional neural network (3DCNN) with temporal information in the third dimension, however a recurrent convolutional network (RCNN) using a pre-trained VGG16 model to extract features and put into a recurrent neural network with a unidirectional Long-Short-Term-Memory (LSTM) layer performed almost as well with the right configuration. An analysis of the results suggests that a 3DCNN's performance is dependent on the camera angle, specifically how well movement is spatially distributed between people in frame.
author Casserfelt, Karl
author_facet Casserfelt, Karl
author_sort Casserfelt, Karl
title A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
title_short A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
title_full A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
title_fullStr A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
title_full_unstemmed A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
title_sort deep learning approach to video processing for scene recognition in smart office environments
publisher Malmö universitet, Fakulteten för teknik och samhälle (TS)
publishDate 2018
url http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20429
work_keys_str_mv AT casserfeltkarl adeeplearningapproachtovideoprocessingforscenerecognitioninsmartofficeenvironments
AT casserfeltkarl deeplearningapproachtovideoprocessingforscenerecognitioninsmartofficeenvironments
_version_ 1719353760234340352