• Acta Photonica Sinica
  • Vol. 51, Issue 12, 1210003 (2022)
Hongjian FU1, Hongyang BAI1、*, Hongwei GUO1, Yuman YUAN1, and Weiwei QIN2
Author Affiliations
  • 1School of Energy and Power Engineering,Nanjing University of Science and Technology,Nanjing 210094,China
  • 2School of Nuclear Engineering,Rocket Force University of Engineering,Xi'an 710025,China
  • show less
    DOI: 10.3788/gzxb20225112.1210003 Cite this Article
    Hongjian FU, Hongyang BAI, Hongwei GUO, Yuman YUAN, Weiwei QIN. Object Detection Method of Optical Remote Sensing Image with Multi-attention Mechanism[J]. Acta Photonica Sinica, 2022, 51(12): 1210003 Copy Citation Text show less

    Abstract

    Optical remote sensing image target detection technology refers to the technology that uses algorithms to automatically classify and locate objects of interest. It has a wide range of applications in military reconnaissance, precision guidance and urban construction. From the perspective of development history, optical remote sensing image target detection technology can be mainly divided into traditional target detection algorithms and deep learning-based target detection algorithms. Compared with traditional target detection algorithms, deep learning-based target detection algorithms can automatically extract target features, and the feature expression is more robust and generalisable. In the field of remote sensing image target detection, the application of deep learning target detection technology can achieve better detection results. However, several problems still exist in remote sensing image target detection, such as large differences in target scales, dense target distribution and complex backgrounds. In response to the above problems, this paper makes improvements based on the YOLOv5 network, and proposes the MA-YOLOv5 (Multi Attention-YOLOv5) network, which improves the remote sensing target detection effect, and the experiments verify the effectiveness of the improvement. Considering the requirement of on-orbit real-time processing of remote sensing images, ensuring a certain detection speed is necessary. Therefore, this paper selects the YOLOv5l network whose network depth and width coefficients are oneas the basic network. YOLOv5 is mainly divided into three parts: Backbone, Neck and Prediction. The Backbone part mainly uses the backbone structure of CSP (Cross Stage Partial) Darknet for feature extraction; the Neck part uses the FPN (Feature Pyramid Network)+PAN (Path Aggregation Network) feature pyramid structure for feature fusion; the Prediction part uses CIOU_loss (C Intersection over Union_loss) as the loss function for calculation. To improve the detection effect of remote sensing images with multiple scales and complex backgrounds, this paper proposes a coordinate attention module with adaptive receptive field size. Through the separation and selection mechanism in the module, the network can adaptively select the information output by convolutions with different receptive field sizes according to the size of the target, thereby improving the feature extraction ability of the model for multi-scale remote sensing targets. At the same time, through the coordinate attention mechanism in the module, the long-term dependency of one spatial direction is captured, and the position information of another spatial direction is saved, which helps the network to locate the target more accurately. In addition, in view of the dense distribution of remote sensing targets, the Swin Transformer self-attention mechanism module is added to the protection head of the YOLOv5 network to enhance the network's ability to capture the target environment information. To verify the influence of the different number of branches of the ARFCA (Adaptive Receptive Field Coordinate Attention) module on the model, and to determine the optimal number of branches of the ARFCA module, a set of ablation experiments are set up in this paper. The experimental results show that the best effect is when the number of ARFCA branches is 3. Finally, this paper sets up a set of experiments to compare the following seven networks: The MA-YOLOv5, YOLOv5 with ARFCA module added, YOLOV5 with STR (Swin Transformer) module added, YOLOv5 original network, SSD, RetinaNet and FCOS. Seven categories of indicators are used for evaluation. The experimental results show that compared with the original YOLOv5 network, the MA-YOLOv5 network achieves a 3.6% improvement in accuracy and has a certain ability of real-time detection.
    Hongjian FU, Hongyang BAI, Hongwei GUO, Yuman YUAN, Weiwei QIN. Object Detection Method of Optical Remote Sensing Image with Multi-attention Mechanism[J]. Acta Photonica Sinica, 2022, 51(12): 1210003
    Download Citation