ℱ<sup>3</sup>-Net: Feature Fusion and Filtration Network for Object Detection in Optical Remote Sensing Images

Object detection in remote sensing (RS) images is a challenging task due to the difficulties of small size, varied appearance, and complex background. Although a lot of methods have been developed to address this problem, many of them cannot fully exploit multilevel context information or handle clu...

Full description

Bibliographic Details
Main Authors: Xinhai Ye, Fengchao Xiong, Jianfeng Lu, Jun Zhou, Yuntao Qian
Format: Article
Language:English
Published: MDPI AG 2020-12-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/12/24/4027
Description
Summary:Object detection in remote sensing (RS) images is a challenging task due to the difficulties of small size, varied appearance, and complex background. Although a lot of methods have been developed to address this problem, many of them cannot fully exploit multilevel context information or handle cluttered background in RS images either. To this end, in this paper, we propose a feature fusion and filtration network (<inline-formula><math display="inline"><semantics><msup><mi mathvariant="script">F</mi><mn>3</mn></msup></semantics></math></inline-formula>-Net) to improve object detection in RS images, which has higher capacity of combining the context information at multiple scales while suppressing the interference from the background. Specifically, <inline-formula><math display="inline"><semantics><msup><mi mathvariant="script">F</mi><mn>3</mn></msup></semantics></math></inline-formula>-Net leverages a feature adaptation block with a residual structure to adjust the backbone network in an end-to-end manner, better considering the characteristics of RS images. Afterward, the network learns the context information of the object at multiple scales by hierarchically fusing the feature maps from different layers. In order to suppress the interference from cluttered background, the fused feature is then projected into a low-dimensional subspace by an additional feature filtration module. As a result, more relevant and accurate context information is extracted for further detection. Extensive experiments on DOTA, NWPU VHR-10, and UCAS AOD datasets demonstrate that the proposed detector achieves very promising detection performance.
ISSN:2072-4292