Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video
At present, human body moving target detection and recognition algorithms based on deep learning have made breakthrough progress. However, in some applications with high real-time requirements, the existing deep learning real-time detection and recognition network is difficult to achieve high detect...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8979324/ |
id |
doaj-a00de5b3c313401ead40011dc8cb9b67 |
---|---|
record_format |
Article |
spelling |
doaj-a00de5b3c313401ead40011dc8cb9b672021-03-30T02:22:26ZengIEEEIEEE Access2169-35362020-01-018258112582210.1109/ACCESS.2020.29712838979324Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in VideoMeimei Gong0https://orcid.org/0000-0002-7095-7647Yiming Shu1https://orcid.org/0000-0002-4608-4153School of Sports, Anhui Polytechnic University, Wuhu, ChinaSchool of Sports, Anhui Polytechnic University, Wuhu, ChinaAt present, human body moving target detection and recognition algorithms based on deep learning have made breakthrough progress. However, in some applications with high real-time requirements, the existing deep learning real-time detection and recognition network is difficult to achieve high detection accuracy. Therefore, how to achieve accurate positioning and recognition of human moving targets while ensuring real-time detection is still an urgent problem in this field. Based on the single shot multi-box detector (SSD) real-time detection network, this paper proposes a real-time detection positioning and recognition network based on multi-scale feature fusion (IMFF-SSD), which improves the positioning accuracy and identification accuracy. First, this article analyzes the multi-scale features extracted from the SSD network. It combines the position-sensitive information provided by low-level detail features with the context information provided by high-level semantic features through feature fusion, which effectively improves positioning accuracy of the target prediction layer in the SSD network. Secondly, a feature embedded prediction structure is designed to strengthen the semantics of target features without changing the spatial resolution of the SSD prediction layer, and embed low-scale detailed features in high-semantic features for collaborative prediction of targets. This improves the accuracy of the SSD network's recognition of human moving targets at all scales. The experimental results show that by combining the above two improvements, the real-time monitoring and recognition network based on multi-scale feature fusion proposed in this paper has achieved a greater degree of improvement in positioning accuracy and motion recognition accuracy than the original SSD, which is better than some current the human body moving object detection and recognition algorithm has great advantages.https://ieeexplore.ieee.org/document/8979324/Deep learningreal-timedetection and motion recognitionmulti-scale feature fusion |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Meimei Gong Yiming Shu |
spellingShingle |
Meimei Gong Yiming Shu Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video IEEE Access Deep learning real-time detection and motion recognition multi-scale feature fusion |
author_facet |
Meimei Gong Yiming Shu |
author_sort |
Meimei Gong |
title |
Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video |
title_short |
Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video |
title_full |
Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video |
title_fullStr |
Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video |
title_full_unstemmed |
Real-Time Detection and Motion Recognition of Human Moving Objects Based on Deep Learning and Multi-Scale Feature Fusion in Video |
title_sort |
real-time detection and motion recognition of human moving objects based on deep learning and multi-scale feature fusion in video |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
At present, human body moving target detection and recognition algorithms based on deep learning have made breakthrough progress. However, in some applications with high real-time requirements, the existing deep learning real-time detection and recognition network is difficult to achieve high detection accuracy. Therefore, how to achieve accurate positioning and recognition of human moving targets while ensuring real-time detection is still an urgent problem in this field. Based on the single shot multi-box detector (SSD) real-time detection network, this paper proposes a real-time detection positioning and recognition network based on multi-scale feature fusion (IMFF-SSD), which improves the positioning accuracy and identification accuracy. First, this article analyzes the multi-scale features extracted from the SSD network. It combines the position-sensitive information provided by low-level detail features with the context information provided by high-level semantic features through feature fusion, which effectively improves positioning accuracy of the target prediction layer in the SSD network. Secondly, a feature embedded prediction structure is designed to strengthen the semantics of target features without changing the spatial resolution of the SSD prediction layer, and embed low-scale detailed features in high-semantic features for collaborative prediction of targets. This improves the accuracy of the SSD network's recognition of human moving targets at all scales. The experimental results show that by combining the above two improvements, the real-time monitoring and recognition network based on multi-scale feature fusion proposed in this paper has achieved a greater degree of improvement in positioning accuracy and motion recognition accuracy than the original SSD, which is better than some current the human body moving object detection and recognition algorithm has great advantages. |
topic |
Deep learning real-time detection and motion recognition multi-scale feature fusion |
url |
https://ieeexplore.ieee.org/document/8979324/ |
work_keys_str_mv |
AT meimeigong realtimedetectionandmotionrecognitionofhumanmovingobjectsbasedondeeplearningandmultiscalefeaturefusioninvideo AT yimingshu realtimedetectionandmotionrecognitionofhumanmovingobjectsbasedondeeplearningandmultiscalefeaturefusioninvideo |
_version_ |
1724185317306531840 |