• Acta Photonica Sinica
  • Vol. 51, Issue 6, 0610004 (2022)
Kang NI1、2, Yuqing ZHAO3, and Zhi CHEN1、*
Author Affiliations
  • 1School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • 2Jiangsu Key Laboratory of Big Data Security and Intelligent Processing,Nanjing 210023,China
  • 3School of Management and Engineering Capital University of Economics and Business,Beijing 100070,China
  • show less
    DOI: 10.3788/gzxb20225106.0610004 Cite this Article
    Kang NI, Yuqing ZHAO, Zhi CHEN. Multi-scale Convolutional Neural Network Driven by Sparse Second-order Attention Mechanism for Remote Sensing Scene Classification[J]. Acta Photonica Sinica, 2022, 51(6): 0610004 Copy Citation Text show less

    Abstract

    Remote sensing image scene classification is one of the important research contents of remote sensing image interpretation. Nowadays, with the rapid development of satellite imaging techniques, remote sensing scene classification which uses High Spatial Resolution (HSR) remote sensing images has received, considerable attention recently, as can be used in natural hazards detection, traffic control, and object detection etc. Based on the feature representation used for remote sensing scene classification, the existing scene classification approaches can be categorized into three classes: handcrafted feature based methods, unsupervised feature learning-based methods, and deep feature learning-based methods. Convolutional Neural Networks (CNNs), one of the deep feature learning-based methods, have achieved great success in the computer vision community. Especially, the powerful feature representations learned through CNNs have been widely used in remote sensing scene classification, but due to the different scale information of ground targets and the complex spatial distribution and texture information of the scene images, the classification effect of the scene classification algorithm based on CNN is insufficient good. For addressing the above problems, the paper proposes a multi-scale convolutional neural network driven by a sparse second-order attention mechanism (MCNN-SSAM) while comprehensively considering the accuracy of scene classification and feature dimensions. The proposed MCNN-SSAM network includes the following parts: backbone network, pyramid convolution module, sparse second-order attention module and softmax classification layer. The network firstly inserts a multi-scale convolution layer after the backbone network to acquire the characteristic expressions of different scale information targets of the ground target, and embeds the group convolution into the multi-scale convolution layer to reduce the computational complexity; Secondly, after discuss the advantage of the attention mechanism of first and second-order statistics, a sparse second order attention mechanism is proposed to enhance the discriminability of channel information of different scale convolution features. The sparsity of the attention mechanism is able to effectively reduce the feature dimension of the second-order statistics while ensuring the performance of scene classification; Finally, the multi-scale convolutional layer and the sparse second-order attention mechanism are embedded into the proposed network for end-to-end training. We conduct extensive experiments on two challenging high-resolution remote sensing data sets, i.e., AID (Aerial Image Dataset) and NWPU45 (NWPU-RESISC45) datasets. The AID dataset contains 10 000 images in RGB space, which has 30 different scene classes and of size 600×600 in each class; There are 31 500 optical RS images for 45 scene classes, and each image measures 256×256 pixels on the NWPU45 dataset. In this paper, the VGG-16 network is selected as the backbone of MCNN-SSAM, and the Adam optimizer is used for end-to-end training. The training parameters of the proposed network are set as follows: initial learning rate 0.001, weight attenuation coefficient 0.001, batch size 32, momentum 0.9. All experiments are implemented in PyTorch, NVIDIA GeForce GTX 8G 1070 Ti GPU, and 32.00 GB RAM. we make the experimental result on AID dataset to analyze the influence of some important parameters on the MCNN-SSAM, then we can conclude that the number of the atoms in the dictionary and low-rank matrix parameters have a greater impact on the remote sensing scene classification performance of the proposed MCNN-SSAM. Afterwards, we compare MCNN-SSAM with some related networks, i.e., AlexNet, VGG-16, SAFF, MSCP, and CapsNet. The experimental results show that: compared with the benchmark network (VGG-16), the overall accuracy (OA) of MCNN-SSAM is improved by 5.27%~5.34% and 10.20%~10.82%; While compared with the SAFF, MSCP, and CapsNet networks, the remote sensing scene classification accuracy is improved by 0.23%~1.61% and 1.34%~2.75%. Additionally, based on the confusion matrix, we can observe that most of the remote sensing scene classes can be classified easily and correctly, some even achieving high classification accuracies, i.e., mountains and viaducts in the AID dataset, jungles and sea ice in the NWPU45 dataset. Meanwhile, the effectiveness of the Sparse Second-order Attention Mechanism (SSAM) is verified by comparing with other related attention mechanisms and heat map results. Finally, we make the ablation generalization experiments to verify the effectiveness of MCNN-SSAM, such as SENet (Squeeze-and-Excitation Networks), CovNet which is based on covariance statistics, and SSAM. We can conclude that, whether the CNN features or multi-scale MCNN features, compared with the attention mechanism based on first-order feature statistics (CNN+SENet and MCNN+SENet), The scene classification accuracy obtained by CNN+CovNet, CNN+SSAM, MCNN+CovNet, and MCNN+SSAM which are based on the attention mechanism of second-order feature statistics has been further improved. In addition, MCNN module, SSAM module, and the fusion of these two modules can improve the classification accuracy of remote sensing image scene images. In this paper, we propose a multi-scale convolutional neural network driven by sparse second-order attention mechanism for remote sensing scene classification. The experiment results illustrate that the proposed MCNN-SSAM improves the accuracy of remote sensing image scene classification while taking into the feature dimensions of the second order feature statistics.
    Kang NI, Yuqing ZHAO, Zhi CHEN. Multi-scale Convolutional Neural Network Driven by Sparse Second-order Attention Mechanism for Remote Sensing Scene Classification[J]. Acta Photonica Sinica, 2022, 51(6): 0610004
    Download Citation