Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video

At present, human body moving target detection and recognition algorithms based on deep learning have made breakthrough progress. However, in some applications with high real-time requirements, the existing deep learning real-time detection and recognition network is difficult to achieve high detect...

Full description

Bibliographic Details
Main Authors: Meimei Gong, Yiming Shu
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8979324/
id doaj-a00de5b3c313401ead40011dc8cb9b67
record_format Article
spelling doaj-a00de5b3c313401ead40011dc8cb9b672021-03-30T02:22:26ZengIEEEIEEE Access2169-35362020-01-018258112582210.1109/ACCESS.2020.29712838979324Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in VideoMeimei Gong0https://orcid.org/0000-0002-7095-7647Yiming Shu1https://orcid.org/0000-0002-4608-4153School of Sports, Anhui Polytechnic University, Wuhu, ChinaSchool of Sports, Anhui Polytechnic University, Wuhu, ChinaAt present, human body moving target detection and recognition algorithms based on deep learning have made breakthrough progress. However, in some applications with high real-time requirements, the existing deep learning real-time detection and recognition network is difficult to achieve high detection accuracy. Therefore, how to achieve accurate positioning and recognition of human moving targets while ensuring real-time detection is still an urgent problem in this field. Based on the single shot multi-box detector (SSD) real-time detection network, this paper proposes a real-time detection positioning and recognition network based on multi-scale feature fusion (IMFF-SSD), which improves the positioning accuracy and identification accuracy. First, this article analyzes the multi-scale features extracted from the SSD network. It combines the position-sensitive information provided by low-level detail features with the context information provided by high-level semantic features through feature fusion, which effectively improves positioning accuracy of the target prediction layer in the SSD network. Secondly, a feature embedded prediction structure is designed to strengthen the semantics of target features without changing the spatial resolution of the SSD prediction layer, and embed low-scale detailed features in high-semantic features for collaborative prediction of targets. This improves the accuracy of the SSD network's recognition of human moving targets at all scales. The experimental results show that by combining the above two improvements, the real-time monitoring and recognition network based on multi-scale feature fusion proposed in this paper has achieved a greater degree of improvement in positioning accuracy and motion recognition accuracy than the original SSD, which is better than some current the human body moving object detection and recognition algorithm has great advantages.https://ieeexplore.ieee.org/document/8979324/Deep learningreal-timedetection and motion recognitionmulti-scale feature fusion
collection DOAJ
language English
format Article
sources DOAJ
author Meimei Gong
Yiming Shu
spellingShingle Meimei Gong
Yiming Shu
Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video
IEEE Access
Deep learning
real-time
detection and motion recognition
multi-scale feature fusion
author_facet Meimei Gong
Yiming Shu
author_sort Meimei Gong
title Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video
title_short Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video
title_full Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video
title_fullStr Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video
title_full_unstemmed Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video
title_sort real-time detection and motion recognition of human moving objects based on deep learning and multi-scale feature fusion in video
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description At present, human body moving target detection and recognition algorithms based on deep learning have made breakthrough progress. However, in some applications with high real-time requirements, the existing deep learning real-time detection and recognition network is difficult to achieve high detection accuracy. Therefore, how to achieve accurate positioning and recognition of human moving targets while ensuring real-time detection is still an urgent problem in this field. Based on the single shot multi-box detector (SSD) real-time detection network, this paper proposes a real-time detection positioning and recognition network based on multi-scale feature fusion (IMFF-SSD), which improves the positioning accuracy and identification accuracy. First, this article analyzes the multi-scale features extracted from the SSD network. It combines the position-sensitive information provided by low-level detail features with the context information provided by high-level semantic features through feature fusion, which effectively improves positioning accuracy of the target prediction layer in the SSD network. Secondly, a feature embedded prediction structure is designed to strengthen the semantics of target features without changing the spatial resolution of the SSD prediction layer, and embed low-scale detailed features in high-semantic features for collaborative prediction of targets. This improves the accuracy of the SSD network's recognition of human moving targets at all scales. The experimental results show that by combining the above two improvements, the real-time monitoring and recognition network based on multi-scale feature fusion proposed in this paper has achieved a greater degree of improvement in positioning accuracy and motion recognition accuracy than the original SSD, which is better than some current the human body moving object detection and recognition algorithm has great advantages.
topic Deep learning
real-time
detection and motion recognition
multi-scale feature fusion
url https://ieeexplore.ieee.org/document/8979324/
work_keys_str_mv AT meimeigong realtimedetectionandmotionrecognitionofhumanmovingobjectsbasedondeeplearningandmultiscalefeaturefusioninvideo
AT yimingshu realtimedetectionandmotionrecognitionofhumanmovingobjectsbasedondeeplearningandmultiscalefeaturefusioninvideo
_version_ 1724185317306531840