Human Action Recognition Based on Integrating Body Pose, Part Shape, and Motion

Human action recognition is a challenging problem, especially in the presence of multiple actors in the scene and/or viewpoint variations. In this paper, three modalities, namely, 3-D skeletons, body part images, and motion history image (MHI), are integrated into a hybrid deep learning architecture...

Full description

Bibliographic Details
Main Authors:	Hany El-Ghaish, Mohamed E. Hussien, Amin Shoukry, Rikio Onai
Format:	Article
Language:	English
Published:	IEEE 2018-01-01
Series:	IEEE Access
Subjects:	Human action recognition spatial and temporal features convolution neural networks (CNN) long short-term memory (LSTM) CNN-LSTM motion history images (MHI)
Online Access:	https://ieeexplore.ieee.org/document/8453782/

id	doaj-713ae66707a6456f9744bc0fff2ada98
record_format	Article
spelling	doaj-713ae66707a6456f9744bc0fff2ada982021-03-29T21:11:21ZengIEEEIEEE Access2169-35362018-01-016490404905510.1109/ACCESS.2018.28683198453782Human Action Recognition Based on Integrating Body Pose, Part Shape, and MotionHany El-Ghaish0https://orcid.org/0000-0003-4182-0016Mohamed E. Hussien1Amin Shoukry2Rikio Onai3Department of Computer Science and Engineering, Egypt-Japan University of Science and Technology, New Borg El-Arab City, Alexandria, EgyptViterbi Sch. of Eng., Univ. of Southern California, Arlington, VA, USADepartment of Computer Science and Engineering, Egypt-Japan University of Science and Technology, New Borg El-Arab City, Alexandria, EgyptDepartment of Computer Science and Engineering, Waseda University, Tokyo, JapanHuman action recognition is a challenging problem, especially in the presence of multiple actors in the scene and/or viewpoint variations. In this paper, three modalities, namely, 3-D skeletons, body part images, and motion history image (MHI), are integrated into a hybrid deep learning architecture for human action recognition. The three modalities capture the main aspects of an action: body pose, part shape, and body motion. Although the 3-D skeleton modality captures the actor's pose, it lacks information about the shape of the body parts as well as the shape of manipulated objects. This is the reason for including both the body-part images and the MHI as additional modalities. The deployed architecture combines convolution neural networks (CNNs), long short-term memory (LSTM), and a fine-tuned pre-trained architecture into a hybrid one. It is called MCLP: multi-modal CNN + LSTM + VGG16 pre-trained on ImageNet. The MCLP consists of three sub-models: CL1D (for CNN1D + LSTM), CL2D (for CNN2D + LSTM), and CMHI (CNN2D for MHI), which simultaneously extract the spatial and temporal patterns in the three modalities. The decisions of these three sub-models are fused by a late multiply fusion module, which proved to yield better accuracy than averaging or maximizing fusion methods. The proposed combined model and its submodels have been evaluated both individually and collectively on four public data sets: UTkinect Action3D, SBU Interaction, Florence3-D Action, and NTU RGB+D. Our recognition rates outperform the state-ofthe-art rates on all the evaluated data sets.https://ieeexplore.ieee.org/document/8453782/Human action recognitionspatial and temporal featuresconvolution neural networks (CNN)long short-term memory (LSTM)CNN-LSTMmotion history images (MHI)
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Hany El-Ghaish Mohamed E. Hussien Amin Shoukry Rikio Onai
spellingShingle	Hany El-Ghaish Mohamed E. Hussien Amin Shoukry Rikio Onai Human Action Recognition Based on Integrating Body Pose, Part Shape, and Motion IEEE Access Human action recognition spatial and temporal features convolution neural networks (CNN) long short-term memory (LSTM) CNN-LSTM motion history images (MHI)
author_facet	Hany El-Ghaish Mohamed E. Hussien Amin Shoukry Rikio Onai
author_sort	Hany El-Ghaish
title	Human Action Recognition Based on Integrating Body Pose, Part Shape, and Motion
title_short	Human Action Recognition Based on Integrating Body Pose, Part Shape, and Motion
title_full	Human Action Recognition Based on Integrating Body Pose, Part Shape, and Motion
title_fullStr	Human Action Recognition Based on Integrating Body Pose, Part Shape, and Motion
title_full_unstemmed	Human Action Recognition Based on Integrating Body Pose, Part Shape, and Motion
title_sort	human action recognition based on integrating body pose, part shape, and motion
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2018-01-01
description	Human action recognition is a challenging problem, especially in the presence of multiple actors in the scene and/or viewpoint variations. In this paper, three modalities, namely, 3-D skeletons, body part images, and motion history image (MHI), are integrated into a hybrid deep learning architecture for human action recognition. The three modalities capture the main aspects of an action: body pose, part shape, and body motion. Although the 3-D skeleton modality captures the actor's pose, it lacks information about the shape of the body parts as well as the shape of manipulated objects. This is the reason for including both the body-part images and the MHI as additional modalities. The deployed architecture combines convolution neural networks (CNNs), long short-term memory (LSTM), and a fine-tuned pre-trained architecture into a hybrid one. It is called MCLP: multi-modal CNN + LSTM + VGG16 pre-trained on ImageNet. The MCLP consists of three sub-models: CL1D (for CNN1D + LSTM), CL2D (for CNN2D + LSTM), and CMHI (CNN2D for MHI), which simultaneously extract the spatial and temporal patterns in the three modalities. The decisions of these three sub-models are fused by a late multiply fusion module, which proved to yield better accuracy than averaging or maximizing fusion methods. The proposed combined model and its submodels have been evaluated both individually and collectively on four public data sets: UTkinect Action3D, SBU Interaction, Florence3-D Action, and NTU RGB+D. Our recognition rates outperform the state-ofthe-art rates on all the evaluated data sets.
topic	Human action recognition spatial and temporal features convolution neural networks (CNN) long short-term memory (LSTM) CNN-LSTM motion history images (MHI)
url	https://ieeexplore.ieee.org/document/8453782/
work_keys_str_mv	AT hanyelghaish humanactionrecognitionbasedonintegratingbodyposepartshapeandmotion AT mohamedehussien humanactionrecognitionbasedonintegratingbodyposepartshapeandmotion AT aminshoukry humanactionrecognitionbasedonintegratingbodyposepartshapeandmotion AT rikioonai humanactionrecognitionbasedonintegratingbodyposepartshapeandmotion
_version_	1724193415411793920

Human Action Recognition Based on Integrating Body Pose, Part Shape, and Motion

Similar Items