Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video

At present, human body moving target detection and recognition algorithms based on deep learning have made breakthrough progress. However, in some applications with high real-time requirements, the existing deep learning real-time detection and recognition network is difficult to achieve high detect...

Full description

Bibliographic Details
Main Authors:	Meimei Gong, Yiming Shu
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Deep learning real-time detection and motion recognition multi-scale feature fusion
Online Access:	https://ieeexplore.ieee.org/document/8979324/

id	doaj-a00de5b3c313401ead40011dc8cb9b67
record_format	Article
spelling	doaj-a00de5b3c313401ead40011dc8cb9b672021-03-30T02:22:26ZengIEEEIEEE Access2169-35362020-01-018258112582210.1109/ACCESS.2020.29712838979324Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in VideoMeimei Gong0https://orcid.org/0000-0002-7095-7647Yiming Shu1https://orcid.org/0000-0002-4608-4153School of Sports, Anhui Polytechnic University, Wuhu, ChinaSchool of Sports, Anhui Polytechnic University, Wuhu, ChinaAt present, human body moving target detection and recognition algorithms based on deep learning have made breakthrough progress. However, in some applications with high real-time requirements, the existing deep learning real-time detection and recognition network is difficult to achieve high detection accuracy. Therefore, how to achieve accurate positioning and recognition of human moving targets while ensuring real-time detection is still an urgent problem in this field. Based on the single shot multi-box detector (SSD) real-time detection network, this paper proposes a real-time detection positioning and recognition network based on multi-scale feature fusion (IMFF-SSD), which improves the positioning accuracy and identification accuracy. First, this article analyzes the multi-scale features extracted from the SSD network. It combines the position-sensitive information provided by low-level detail features with the context information provided by high-level semantic features through feature fusion, which effectively improves positioning accuracy of the target prediction layer in the SSD network. Secondly, a feature embedded prediction structure is designed to strengthen the semantics of target features without changing the spatial resolution of the SSD prediction layer, and embed low-scale detailed features in high-semantic features for collaborative prediction of targets. This improves the accuracy of the SSD network's recognition of human moving targets at all scales. The experimental results show that by combining the above two improvements, the real-time monitoring and recognition network based on multi-scale feature fusion proposed in this paper has achieved a greater degree of improvement in positioning accuracy and motion recognition accuracy than the original SSD, which is better than some current the human body moving object detection and recognition algorithm has great advantages.https://ieeexplore.ieee.org/document/8979324/Deep learningreal-timedetection and motion recognitionmulti-scale feature fusion
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Meimei Gong Yiming Shu
spellingShingle	Meimei Gong Yiming Shu Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video IEEE Access Deep learning real-time detection and motion recognition multi-scale feature fusion
author_facet	Meimei Gong Yiming Shu
author_sort	Meimei Gong
title	Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video
title_short	Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video
title_full	Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video
title_fullStr	Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video
title_full_unstemmed	Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video
title_sort	real-time detection and motion recognition of human moving objects based on deep learning and multi-scale feature fusion in video
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	At present, human body moving target detection and recognition algorithms based on deep learning have made breakthrough progress. However, in some applications with high real-time requirements, the existing deep learning real-time detection and recognition network is difficult to achieve high detection accuracy. Therefore, how to achieve accurate positioning and recognition of human moving targets while ensuring real-time detection is still an urgent problem in this field. Based on the single shot multi-box detector (SSD) real-time detection network, this paper proposes a real-time detection positioning and recognition network based on multi-scale feature fusion (IMFF-SSD), which improves the positioning accuracy and identification accuracy. First, this article analyzes the multi-scale features extracted from the SSD network. It combines the position-sensitive information provided by low-level detail features with the context information provided by high-level semantic features through feature fusion, which effectively improves positioning accuracy of the target prediction layer in the SSD network. Secondly, a feature embedded prediction structure is designed to strengthen the semantics of target features without changing the spatial resolution of the SSD prediction layer, and embed low-scale detailed features in high-semantic features for collaborative prediction of targets. This improves the accuracy of the SSD network's recognition of human moving targets at all scales. The experimental results show that by combining the above two improvements, the real-time monitoring and recognition network based on multi-scale feature fusion proposed in this paper has achieved a greater degree of improvement in positioning accuracy and motion recognition accuracy than the original SSD, which is better than some current the human body moving object detection and recognition algorithm has great advantages.
topic	Deep learning real-time detection and motion recognition multi-scale feature fusion
url	https://ieeexplore.ieee.org/document/8979324/
work_keys_str_mv	AT meimeigong realtimedetectionandmotionrecognitionofhumanmovingobjectsbasedondeeplearningandmultiscalefeaturefusioninvideo AT yimingshu realtimedetectionandmotionrecognitionofhumanmovingobjectsbasedondeeplearningandmultiscalefeaturefusioninvideo
_version_	1724185317306531840

Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video

Similar Items