• Acta Photonica Sinica
  • Vol. 51, Issue 6, 0610006 (2022)
Shuai HAO1, Shan GAO1, Xu MA1、*, Beiyi AN1, Tian HE1, Hu WEN2, and Feng WANG3
Author Affiliations
  • 1College of Electrical and Control Engineering,Xi'an University of Science and Technology,Xi'an 710054,China
  • 2College of Safety Science and Engineering,Xi'an University of Science and Technology,Xi'an 710054,China
  • 3College of Physics and Electrical Engineering,Weinan Normal University,Weinan,Shaanxi 714000,China
  • show less
    DOI: 10.3788/gzxb20225106.0610006 Cite this Article
    Shuai HAO, Shan GAO, Xu MA, Beiyi AN, Tian HE, Hu WEN, Feng WANG. Infrared Pedestrian Detection Based on Cross-scale Feature Aggregation and Hierarchical Attention Mapping[J]. Acta Photonica Sinica, 2022, 51(6): 0610006 Copy Citation Text show less

    Abstract

    The detection system based on infrared thermal imaging has been extensively used in pedestrian detection because of its strong anti-interference ability, long detection distance and less affected by light and climate change. However, due to its unique thermal radiation imaging, infrared images usually have the defects of unclear texture features and low spatial resolution. At the same time, infrared pedestrian features are easy to be submerged by the bright background, which makes the detection algorithm difficult to locate the object region accurately. In addition, the multi-scale characteristics and mutual occlusion of pedestrian objects also pose a serious challenge to the performance of the detection algorithm. Therefore, aiming at the problem that traditional pedestrian detection algorithms are difficult to detect accurately owing to multi-scale, partial occlusion and environmental interference in infrared pedestrian images, an infrared pedestrian detection algorithm based on cross-scale feature aggregation and hierarchical attention mapping is proposed. Firstly, the CSPdarknet53 structure is utilized as the backbone feature extraction network. On this basis, to reduce the loss of small-scale object feature information during the down-sampling process in the backbone network, the focus module is introduced and added at the input to replace the first residual layer. Using slice segmentation sampling, the spatial dimension information in the original image is extracted to the channel dimension to realize lossless down-sampling. Secondly, to improve the multi-scale feature aggregation ability of the detection network and improve detection accuracy of the network, a cross-scale feature aggregation module is constructed to integrate the global features and multi-scale local features output by different residual layers of the backbone network. Then, aiming at the problem that infrared images are vulnerable to the effects of self-imaging mechanism and complex background and cannot effectively express pedestrian object features, a hierarchical attention mapping module is constructed by embedding visual attention mechanism into multi-layer feature transfer branches of feature pyramid. In the constructed detection network, the attention mechanisms based on the location, appearance and semantic features of pedestrian objects are established respectively. It establishes semantic and localization associations with spatial and channel dimensions and adaptively adjusts weight coefficients of regions of interest at different scales. The detector can quickly focus on pedestrian objects in the feature extraction process and effectively improve pedestrian detection performance in a complex environment. The ablation experiment proves that the proposed cross-scale feature aggregation module can effectively fuse the features of different scales and improve the pedestrian object detection performance in multi-scale and partially occlusion regions. The constructed hierarchical attention mapping module can enhance the salience of pedestrian objects in the complex background and solve the missed and false detection caused by the lack of feature expressive ability of pedestrian objects in the complex environment. Finally, in order to verify the effectiveness of the proposed algorithm, three infrared pedestrian detection datasets were selected from the OTCBVS common benchmark database for testing. The selected test set covers a variety of complex detection environments, including multi-scale pedestrian objects, highlighted pseudo-objects, fuzzy scenes, etc. The selected experimental scene covers the real pedestrian detection scene well, which can well demonstrate the detection effect of the algorithm in the real scene. In order to verify the advantages of the proposed algorithm, four mainstream object detection algorithms are selected and compared with the proposed algorithm from subjective evaluation and objective evaluation indexes respectively. Experimental results demonstrate that the proposed algorithm has obvious advantages over the contrast algorithm in both subjective and objective evaluation. A large number of experimental results also show that the algorithm can achieve accurate detection of infrared multi-scale pedestrians in a complex environment, with an average accuracy of 95.37% and recall rate of 92.99%.
    Shuai HAO, Shan GAO, Xu MA, Beiyi AN, Tian HE, Hu WEN, Feng WANG. Infrared Pedestrian Detection Based on Cross-scale Feature Aggregation and Hierarchical Attention Mapping[J]. Acta Photonica Sinica, 2022, 51(6): 0610006
    Download Citation