Robust Multi-Feature Learning for Skeleton-Based Action Recognition

Skeleton-based action recognition has advanced significantly in the past decade. Among deep learning-based action recognition methods, one of the most commonly used structures is a two-stream network. This type of network extracts high-level spatial and temporal features from skeleton coordinates an...

Full description

Bibliographic Details
Main Authors:	Yingfu Wang, Zheyuan Xu, Li Li, Jian Yao
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Action recognition skeleton multi-feature learning CNN robustness
Online Access:	https://ieeexplore.ieee.org/document/8859223/

id	doaj-330f0be107a4479282a6f85491af3c18
record_format	Article
spelling	doaj-330f0be107a4479282a6f85491af3c182021-03-29T23:56:40ZengIEEEIEEE Access2169-35362019-01-01714865814867110.1109/ACCESS.2019.29456328859223Robust Multi-Feature Learning for Skeleton-Based Action RecognitionYingfu Wang0https://orcid.org/0000-0001-9949-7579Zheyuan Xu1Li Li2Jian Yao3School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, ChinaSchool of Remote Sensing and Information Engineering, Wuhan University, Wuhan, ChinaSchool of Remote Sensing and Information Engineering, Wuhan University, Wuhan, ChinaSchool of Remote Sensing and Information Engineering, Wuhan University, Wuhan, ChinaSkeleton-based action recognition has advanced significantly in the past decade. Among deep learning-based action recognition methods, one of the most commonly used structures is a two-stream network. This type of network extracts high-level spatial and temporal features from skeleton coordinates and optical flows, respectively. However, other features, such as the structure of the skeleton or the relations of specific joint pairs, are sometimes ignored, even though using these features can also improve action recognition performance. To robustly learn more low-level skeleton features, this paper introduces an efficient fully convolutional network to process multiple input features. The network has multiple streams, each of which has the same encoder-decoder structure. A temporal convolutional network and a co-occurrence convolutional network encode the local and global features, and a convolutional classifier decodes high-level features to classify the action. Moreover, a novel fusion strategy is proposed to combine independent feature learning and dependent feature relating. Detailed ablation studies are performed to confirm the network's robustness to all feature inputs. If more features are combined and the number of streams increases, performance can be further improved. The proposed network is evaluated on three skeleton datasets: NTU-RGB + D, Kinetics, and UTKinect. The experimental results show its effectiveness and performance superiority over state-of-the-art methods.https://ieeexplore.ieee.org/document/8859223/Action recognitionskeletonmulti-feature learningCNNrobustness
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Yingfu Wang Zheyuan Xu Li Li Jian Yao
spellingShingle	Yingfu Wang Zheyuan Xu Li Li Jian Yao Robust Multi-Feature Learning for Skeleton-Based Action Recognition IEEE Access Action recognition skeleton multi-feature learning CNN robustness
author_facet	Yingfu Wang Zheyuan Xu Li Li Jian Yao
author_sort	Yingfu Wang
title	Robust Multi-Feature Learning for Skeleton-Based Action Recognition
title_short	Robust Multi-Feature Learning for Skeleton-Based Action Recognition
title_full	Robust Multi-Feature Learning for Skeleton-Based Action Recognition
title_fullStr	Robust Multi-Feature Learning for Skeleton-Based Action Recognition
title_full_unstemmed	Robust Multi-Feature Learning for Skeleton-Based Action Recognition
title_sort	robust multi-feature learning for skeleton-based action recognition
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	Skeleton-based action recognition has advanced significantly in the past decade. Among deep learning-based action recognition methods, one of the most commonly used structures is a two-stream network. This type of network extracts high-level spatial and temporal features from skeleton coordinates and optical flows, respectively. However, other features, such as the structure of the skeleton or the relations of specific joint pairs, are sometimes ignored, even though using these features can also improve action recognition performance. To robustly learn more low-level skeleton features, this paper introduces an efficient fully convolutional network to process multiple input features. The network has multiple streams, each of which has the same encoder-decoder structure. A temporal convolutional network and a co-occurrence convolutional network encode the local and global features, and a convolutional classifier decodes high-level features to classify the action. Moreover, a novel fusion strategy is proposed to combine independent feature learning and dependent feature relating. Detailed ablation studies are performed to confirm the network's robustness to all feature inputs. If more features are combined and the number of streams increases, performance can be further improved. The proposed network is evaluated on three skeleton datasets: NTU-RGB + D, Kinetics, and UTKinect. The experimental results show its effectiveness and performance superiority over state-of-the-art methods.
topic	Action recognition skeleton multi-feature learning CNN robustness
url	https://ieeexplore.ieee.org/document/8859223/
work_keys_str_mv	AT yingfuwang robustmultifeaturelearningforskeletonbasedactionrecognition AT zheyuanxu robustmultifeaturelearningforskeletonbasedactionrecognition AT lili robustmultifeaturelearningforskeletonbasedactionrecognition AT jianyao robustmultifeaturelearningforskeletonbasedactionrecognition
_version_	1724188855989436416

Robust Multi-Feature Learning for Skeleton-Based Action Recognition

Similar Items