Lightweight Real-time Detection Model of Infrared Pedestrian Embedded in Fine-scale

Yinhui ZHANG; Pengcheng ZHANG; Zifen HE; Sen WANG

doi:10.3788/gzxb20225109.0910001

Abstract

The growing number of cars makes accidents more frequent. Due to the poor visibility conditions of drivers at night, the accident rate is higher than during the day. Therefore, various assisted driving technologies to enhance driving safety are widely used to reduce traffic accidents in the nighttime environment, among which infrared cameras have unique advantages at night. On the one hand, the visible light imaging of general cameras is easily affected by other interference light sources, and the low-quality images obtained in the nighttime environment with insufficient light will make pedestrian detection extremely difficult. The infrared camera technology based on the object's thermal radiation and reflection imaging can achieve barrier-free night vision without being affected by the interference light sources. On the other hand, the decreasing cost of infrared imaging equipment makes its application scenarios more and more common. Aiming at the night driving environment with a high accident rate, a night infrared image pedestrian detection model is proposed, which can detect pedestrians on the road at night in real-time. This research has important value and broad market application prospects in vehicle assisted driving, providing higher security for vehicles and pedestrians.Aiming at the problems of insufficient information such as color and texture of infrared images, low detection accuracy compared with visible light images, a large number of detection model parameters, and dependence on high-performance GPU resources, resulting in slow detection speed and other problems, a multi-scale embedding method fused with fine-scale pedestrian objects was proposed. Detection layer, lightweight real-time detection TIPRD model. First, to obtain more accurate infrared pedestrian location features, a 64×64 fine-scale detection layer is embedded on the original Yolov4-tiny structure to form a multi-detection layer structure, and a CSP module is added to deepen the backbone network to fuse the location features of infrared pedestrians; Secondly, in view of the relatively fixed aspect ratio of the infrared pedestrian target, K-means++ clustering is used to analyze the preset parameters of the a priori frame suitable for the infrared pedestrian target for improvement of the match between the a priori frame and the infrared pedestrian target. Finally, in order to reduce the model parameters, the model is processed through the BN layer channel pruning, and the model before pruning is used as the teacher model. At the same time, the model after pruning is used as the student model. The knowledge distillation algorithm is used instead of fine-tuning to complete the micro-control of TIPRD. While ensuring the detection accuracy, the model parameters are greatly reduced and the model is further lightened.Experiments based on the Yolov4-tiny network model show that using three strategies of fine-scale multi-detection layer embedded in 64×64, adding a CSP module and a priori box clustering can improve the detection accuracy of infrared pedestrian targets by 8.6%. But with the increase of model parameters, the model size increases by 1.4M, and the FPS decreases by 11.4 frames/s. Therefore, Yolo-pedestrian needs to be channel pruned to achieve model lightweight. After pruning the BN layer channel, the model detection accuracy will be reduced to varying degrees. Therefore, this paper uses knowledge distillation instead of fine-tuning to achieve the accuracy recovery of the model after pruning. When the pruning rate is 0.8, the model size is compressed by 20.9M, and the FPS is increased by 8.4 frames/s. The model can maintain the original accuracy after pruning through the knowledge distillation algorithm, achieving a lightweight model. Under the premise of approximating the accuracy of the Yolov4 network model, the size of the TIPRD model is only 1.5% of the Yolov4 model, which is far smaller than other detection models of the same type. Compared with the Yolo-pedestrian model before pruning, the FPS is improved by 9.4 frames/s. At the same time, the TIPRD model also has an extremely fast detection speed of 88.7 frames/s, which meets the requirements of real-time detection.For the assisted driving system with limited computing resources, a lightweight model TIPRD with high accuracy is proposed, which provides a certain reference value for the application of infrared pedestrian detection in the nighttime assisted driving system deployed on the mobile terminal. Firstly, the structure is improved based on the Yolov4-tiny network. The CSP structure is circulated on the original network structure to strengthen the network feature extraction ability, and a detection layer with a size of 64×64 is added. A feature fusion line is added between the new detection layer and the backbone network to fuse the location features of infrared pedestrians and enrich the semantic information of feature maps. And according to the relatively fixed length and width of pedestrian targets, the K-means++ clustering algorithm is used to analyze the preset model parameters of the apriori frame suitable for infrared pedestrian detection, which improves the match between the apriori frame and the pedestrian target; the model accuracy is improved by 8.6 points. Percentage points, verifying the effectiveness of our improvements on the Yolov4-tiny algorithm. Secondly, based on the improved pedestrian detection model, the BN layer channel pruning strategy is used to achieve compression, and the knowledge distillation algorithm is applied to complete the micro-adjustment of the model. On the premise of maintaining accuracy, the deep compression of the model is realised, and the model's size is compressed. At the same time, the test speed reaches 88.7 frames/s, 8.4 frames/s higher than before pruning, which meets the requirements of real-time detections. Finally, the deployment of the TIPRD infrared pedestrian detection model at night on the Jetson Nano (2GB) mobile terminal development platform is realised, and the FPS is increased by 1.7 frames/s, by which the feasibility of running the model in the mobile terminal is further verified and good engineering application value is shown.

微信扫一扫：分享

微信扫一扫：分享