Activity Recognition with Combination of Deeply Learned Visual Attention and Pose Estimation

While human activity recognition and pose estimation are closely related, these two issues are usually treated as separate tasks. In this thesis, two-dimension and three-dimension pose estimation is obtained for human activity recognition in a video sequence, and final activity is determined by comb...

Full description

Bibliographic Details
Main Authors:	Jisu Kim, Deokwoo Lee
Format:	Article
Language:	English
Published:	MDPI AG 2021-05-01
Series:	Applied Sciences
Subjects:	activity recognition deep neural network visual attention pose estimation
Online Access:	https://www.mdpi.com/2076-3417/11/9/4153

id	doaj-1980bbef3838436ba29b6287af4921d9
record_format	Article
spelling	doaj-1980bbef3838436ba29b6287af4921d92021-05-31T23:02:25ZengMDPI AGApplied Sciences2076-34172021-05-01114153415310.3390/app11094153Activity Recognition with Combination of Deeply Learned Visual Attention and Pose EstimationJisu Kim0Deokwoo Lee1Department of Computer Engineering, Keimyung University, Daegu 42601, KoreaDepartment of Computer Engineering, Keimyung University, Daegu 42601, KoreaWhile human activity recognition and pose estimation are closely related, these two issues are usually treated as separate tasks. In this thesis, two-dimension and three-dimension pose estimation is obtained for human activity recognition in a video sequence, and final activity is determined by combining it with an activity algorithm with visual attention. Two problems can be solved efficiently using a single architecture. It is also shown that end-to-end optimization leads to much higher accuracy than separated learning. The proposed architecture can be trained seamlessly with different categories of data. For visual attention, soft visual attention is used, and a multilayer recurrent neural network using long short term memory that can be used both temporally and spatially is used. The image, pose estimated skeleton, and RGB-based activity recognition data are all synthesized to determine the final activity to increase reliability. Visual attention evaluates the model in UCF-11 (Youtube Action), HMDB-51 and Hollywood2 data sets, and analyzes how to focus according to the scene and task the model is performing. Pose estimation and activity recognition are tested and analyzed on MPII, Human3.6M, Penn Action and NTU data sets. Test results are Penn Action 98.9%, NTU 87.9%, and NW-UCLA 88.6%.https://www.mdpi.com/2076-3417/11/9/4153activity recognitiondeep neural networkvisual attentionpose estimation
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jisu Kim Deokwoo Lee
spellingShingle	Jisu Kim Deokwoo Lee Activity Recognition with Combination of Deeply Learned Visual Attention and Pose Estimation Applied Sciences activity recognition deep neural network visual attention pose estimation
author_facet	Jisu Kim Deokwoo Lee
author_sort	Jisu Kim
title	Activity Recognition with Combination of Deeply Learned Visual Attention and Pose Estimation
title_short	Activity Recognition with Combination of Deeply Learned Visual Attention and Pose Estimation
title_full	Activity Recognition with Combination of Deeply Learned Visual Attention and Pose Estimation
title_fullStr	Activity Recognition with Combination of Deeply Learned Visual Attention and Pose Estimation
title_full_unstemmed	Activity Recognition with Combination of Deeply Learned Visual Attention and Pose Estimation
title_sort	activity recognition with combination of deeply learned visual attention and pose estimation
publisher	MDPI AG
series	Applied Sciences
issn	2076-3417
publishDate	2021-05-01
description	While human activity recognition and pose estimation are closely related, these two issues are usually treated as separate tasks. In this thesis, two-dimension and three-dimension pose estimation is obtained for human activity recognition in a video sequence, and final activity is determined by combining it with an activity algorithm with visual attention. Two problems can be solved efficiently using a single architecture. It is also shown that end-to-end optimization leads to much higher accuracy than separated learning. The proposed architecture can be trained seamlessly with different categories of data. For visual attention, soft visual attention is used, and a multilayer recurrent neural network using long short term memory that can be used both temporally and spatially is used. The image, pose estimated skeleton, and RGB-based activity recognition data are all synthesized to determine the final activity to increase reliability. Visual attention evaluates the model in UCF-11 (Youtube Action), HMDB-51 and Hollywood2 data sets, and analyzes how to focus according to the scene and task the model is performing. Pose estimation and activity recognition are tested and analyzed on MPII, Human3.6M, Penn Action and NTU data sets. Test results are Penn Action 98.9%, NTU 87.9%, and NW-UCLA 88.6%.
topic	activity recognition deep neural network visual attention pose estimation
url	https://www.mdpi.com/2076-3417/11/9/4153
work_keys_str_mv	AT jisukim activityrecognitionwithcombinationofdeeplylearnedvisualattentionandposeestimation AT deokwoolee activityrecognitionwithcombinationofdeeplylearnedvisualattentionandposeestimation
_version_	1721418472321187840

Activity Recognition with Combination of Deeply Learned Visual Attention and Pose Estimation

Similar Items