ssFPN: Scale Sequence (S2) Feature-Based Feature Pyramid Network for Object Detection

Object detection is a fundamental task in computer vision. Over the past several years, convolutional neural network (CNN)-based object detection models have significantly improved detection accuracyin terms of average precision (AP). Furthermore, feature pyramid networks (FPNs) are essential module...

Full description

Bibliographic Details
Main Authors: Kang, J.-W (Author), Kim, B.-G (Author), Park, H.-J (Author)
Format: Article
Language:English
Published: MDPI 2023
Subjects:
Online Access:View Fulltext in Publisher
View in Scopus
LEADER 04614nam a2200433Ia 4500
001 10.3390-s23094432
008 230529s2023 CNT 000 0 und d
020 |a 14248220 (ISSN) 
245 1 0 |a ssFPN: Scale Sequence (S2) Feature-Based Feature Pyramid Network for Object Detection 
260 0 |b MDPI  |c 2023 
856 |z View Fulltext in Publisher  |u https://doi.org/10.3390/s23094432 
856 |z View in Scopus  |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159209038&doi=10.3390%2fs23094432&partnerID=40&md5=bb9ff87cd11c701c63268d25e88d91d5 
520 3 |a Object detection is a fundamental task in computer vision. Over the past several years, convolutional neural network (CNN)-based object detection models have significantly improved detection accuracyin terms of average precision (AP). Furthermore, feature pyramid networks (FPNs) are essential modules for object detection models to consider various object scales. However, the AP for small objects is lower than the AP for medium and large objects. It is difficult to recognize small objects because they do not have sufficient information, and information is lost in deeper CNN layers. This paper proposes a new FPN model named ssFPN (scale sequence (S (Formula presented.)) feature-based feature pyramid network) to detect multi-scale objects, especially small objects. We propose a new scale sequence (S (Formula presented.)) feature that is extracted by 3D convolution on the level of the FPN. It is defined and extracted from the FPN to strengthen the information on small objects based on scale-space theory. Motivated by this theory, the FPN is regarded as a scale space and extracts a scale sequence (S (Formula presented.)) feature by three-dimensional convolution on the level axis of the FPN. The defined feature is basically scale-invariant and is built on a high-resolution pyramid feature map for small objects. Additionally, the deigned S (Formula presented.) feature can be extended to most object detection models based on FPNs. We also designed a feature-level super-resolution approach to show the efficiency of the scale sequence (S (Formula presented.)) feature. We verified that the scale sequence (S (Formula presented.)) feature could improve the classification accuracy for low-resolution images by training a feature-level super-resolution model. To demonstrate the effect of the scale sequence (S (Formula presented.)) feature, experiments on the scale sequence (S (Formula presented.)) feature built-in object detection approach including both one-stage and two-stage models were conducted on the MS COCO dataset. For the two-stage object detection models Faster R-CNN and Mask R-CNN with the S (Formula presented.) feature, AP improvements of up to 1.6% and 1.4%, respectively, were achieved. Additionally, the AP (Formula presented.) of each model was improved by 1.2% and 1.1%, respectively. Furthermore, the one-stage object detection models in the YOLO series were improved. For YOLOv4-P5, YOLOv4-P6, YOLOR-P6, YOLOR-W6, and YOLOR-D6 with the S (Formula presented.) feature, 0.9%, 0.5%, 0.5%, 0.1%, and 0.1% AP improvements were observed. For small object detection, the AP (Formula presented.) increased by 1.1%, 1.1%, 0.9%, 0.4%, and 0.1%, respectively. Experiments using the feature-level super-resolution approach with the proposed scale sequence (S (Formula presented.)) feature were conducted on the CIFAR-100 dataset. By training the feature-level super-resolution model, we verified that ResNet-101 with the S (Formula presented.) feature trained on LR images achieved a 55.2% classification accuracy, which was 1.6% higher than for ResNet-101 trained on HR images. © 2023 by the authors. 
650 0 4 |a Classification (of information) 
650 0 4 |a Convolution 
650 0 4 |a Convolutional neural network 
650 0 4 |a convolutional neural network (CNN) 
650 0 4 |a Convolutional neural networks 
650 0 4 |a deep learning 
650 0 4 |a Deep learning 
650 0 4 |a Detection models 
650 0 4 |a Feature extraction 
650 0 4 |a Feature pyramid 
650 0 4 |a feature pyramid network 
650 0 4 |a Feature pyramid network 
650 0 4 |a Image enhancement 
650 0 4 |a object detection 
650 0 4 |a Object detection 
650 0 4 |a Object recognition 
650 0 4 |a Objects detection 
650 0 4 |a Optical resolving power 
650 0 4 |a Pyramid network 
650 0 4 |a scale sequence (S2) feature 
650 0 4 |a Scale sequence (S2) feature 
650 0 4 |a Small objects 
700 1 0 |a Kang, J.-W.  |e author 
700 1 0 |a Kim, B.-G.  |e author 
700 1 0 |a Park, H.-J.  |e author 
773 |t Sensors