A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
The field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to alm...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Malmö universitet, Fakulteten för teknik och samhälle (TS)
2018
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20429 |
id |
ndltd-UPSALLA1-oai-DiVA.org-mau-20429 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-mau-204292020-10-28T05:38:24ZA Deep Learning Approach to Video Processing for Scene Recognition in Smart Office EnvironmentsengCasserfelt, KarlMalmö universitet, Fakulteten för teknik och samhälle (TS)Malmö universitet/Teknik och samhälle2018scene recognitiondeep learningcomputer visionactivity recognitionoffice activityneural networks3DCNNEngineering and TechnologyTeknik och teknologierThe field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to almost solve the problem of object recognition in image data. One of the next big challenges in computer vision is to allow computers to not only recognize objects, but also activities. This study is an exploration of the capabilities of deep learning for the specific problem area of activity recognition in office environments. The study used a re-labeled subset of the AMI Meeting Corpus video data set to comparatively evaluate different neural network models performance in the given problem area, and then evaluated the best performing model on a new novel data set of office activities captured in a research lab in Malmö University. The results showed that the best performing model was a 3D convolutional neural network (3DCNN) with temporal information in the third dimension, however a recurrent convolutional network (RCNN) using a pre-trained VGG16 model to extract features and put into a recurrent neural network with a unidirectional Long-Short-Term-Memory (LSTM) layer performed almost as well with the right configuration. An analysis of the results suggests that a 3DCNN's performance is dependent on the camera angle, specifically how well movement is spatially distributed between people in frame. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20429Local 26198application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
scene recognition deep learning computer vision activity recognition office activity neural networks 3DCNN Engineering and Technology Teknik och teknologier |
spellingShingle |
scene recognition deep learning computer vision activity recognition office activity neural networks 3DCNN Engineering and Technology Teknik och teknologier Casserfelt, Karl A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments |
description |
The field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to almost solve the problem of object recognition in image data. One of the next big challenges in computer vision is to allow computers to not only recognize objects, but also activities. This study is an exploration of the capabilities of deep learning for the specific problem area of activity recognition in office environments. The study used a re-labeled subset of the AMI Meeting Corpus video data set to comparatively evaluate different neural network models performance in the given problem area, and then evaluated the best performing model on a new novel data set of office activities captured in a research lab in Malmö University. The results showed that the best performing model was a 3D convolutional neural network (3DCNN) with temporal information in the third dimension, however a recurrent convolutional network (RCNN) using a pre-trained VGG16 model to extract features and put into a recurrent neural network with a unidirectional Long-Short-Term-Memory (LSTM) layer performed almost as well with the right configuration. An analysis of the results suggests that a 3DCNN's performance is dependent on the camera angle, specifically how well movement is spatially distributed between people in frame. |
author |
Casserfelt, Karl |
author_facet |
Casserfelt, Karl |
author_sort |
Casserfelt, Karl |
title |
A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments |
title_short |
A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments |
title_full |
A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments |
title_fullStr |
A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments |
title_full_unstemmed |
A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments |
title_sort |
deep learning approach to video processing for scene recognition in smart office environments |
publisher |
Malmö universitet, Fakulteten för teknik och samhälle (TS) |
publishDate |
2018 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20429 |
work_keys_str_mv |
AT casserfeltkarl adeeplearningapproachtovideoprocessingforscenerecognitioninsmartofficeenvironments AT casserfeltkarl deeplearningapproachtovideoprocessingforscenerecognitioninsmartofficeenvironments |
_version_ |
1719353760234340352 |