| Summary: | YOLOv8-PEL shows outstanding performance in detection accuracy, computational efficiency, and generalization capability, making it suitable for real-time and resource-constrained applications. This study aims to address the challenges of vehicle detection in scenarios with fixed camera angles, where precision is often compromised for the sake of cost control and real-time performance, by leveraging the enhanced YOLOv8-PEL model. We have refined the YOLOv8n model by introducing the innovative C2F-PPA module within the feature fusion segment, bolstering the adaptability and integration of features across varying scales. Furthermore, we have proposed ELA-FPN, which further refines the model’s multi-scale feature fusion and generalization capabilities. The model also incorporates the Wise-IoUv3 loss function to mitigate the deleterious gradients caused by extreme examples in vehicle detection samples, resulting in more precise detection outcomes. We employed the COCO-Vehicle dataset and the VisDrone2019 dataset for our training, with the former being a subset of the COCO dataset that exclusively contains images and labels of cars, buses, and trucks. Experimental results demonstrate that the YOLOv8-PEL model achieved a mAP@0.5 of 66.9% on the COCO-Vehicle dataset, showcasing excellent efficiency with only 2.23 M parameters, 7.0 GFLOPs, a mere 4.5 MB model size, and 176.8 FPS—an increase from the original YOLOv8n’s inference speed of 165.7 FPS. Despite a marginal 0.2% decrease in accuracy compared to the original YOLOv8n, the parameters, GFLOPs, and model size were reduced by 25%, 13%, and 25%, respectively. The YOLOv8-PEL model excels in detection precision, computational efficiency, and generalizability, making it well-suited for real-time and resource-constrained application scenarios.
|