ACF Based Region Proposal Extraction for YOLOv3 Network Towards High-Performance Cyclist Detection in High Resolution Images

You Only Look Once (YOLO) deep network can detect objects quickly with high precision and has been successfully applied in many detection problems. The main shortcoming of YOLO network is that YOLO network usually cannot achieve high precision when dealing with small-size object detection in high re...

Full description

Bibliographic Details
Main Authors: Chunsheng Liu, Yu Guo, Shuang Li, Faliang Chang
Format: Article
Language:English
Published: MDPI AG 2019-06-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/19/12/2671
Description
Summary:You Only Look Once (YOLO) deep network can detect objects quickly with high precision and has been successfully applied in many detection problems. The main shortcoming of YOLO network is that YOLO network usually cannot achieve high precision when dealing with small-size object detection in high resolution images. To overcome this problem, we propose an effective region proposal extraction method for YOLO network to constitute an entire detection structure named ACF-PR-YOLO, and take the cyclist detection problem to show our methods. Instead of directly using the generated region proposals for classification or regression like most region proposal methods do, we generate large-size potential regions containing objects for the following deep network. The proposed ACF-PR-YOLO structure includes three main parts. Firstly, a region proposal extraction method based on aggregated channel feature (ACF) is proposed, called ACF based region proposal (ACF-PR) method. In ACF-PR, ACF is firstly utilized to fast extract candidates and then a bounding boxes merging and extending method is designed to merge the bounding boxes into correct region proposals for the following YOLO net. Secondly, we design suitable YOLO net for fine detection in the region proposals generated by ACF-PR. Lastly, we design a post-processing step, in which the results of YOLO net are mapped into the original image outputting the detection and localization results. Experiments performed on the Tsinghua-Daimler Cyclist Benchmark with high resolution images and complex scenes show that the proposed method outperforms the other tested representative detection methods in average precision, and that it outperforms YOLOv3 by <inline-formula> <math display="inline"> <semantics> <mrow> <mn>13.69</mn> <mo>%</mo> </mrow> </semantics> </math> </inline-formula> average precision and outperforms SSD by <inline-formula> <math display="inline"> <semantics> <mrow> <mn>25.27</mn> <mo>%</mo> </mrow> </semantics> </math> </inline-formula> average precision.
ISSN:1424-8220