
- Journal of Applied Optics
- Vol. 44, Issue 5, 1010 (2023)
Abstract
Keywords
引言
基于深度学习的目标检测技术在遥感领域已广泛应用[
Figure 1.Comparison between remote sensing images (the first row) and natural images (the second row)
近年来,在基于深度学习的通用目标检测基础上发展出了多种旋转目标检测方法。通用目标检测主要回归目标区域的
基于RoI Transformer的旋转目标检测常用于双阶段(two-stage)目标检测,其包含生成目标候选区域和目标检测分类两个步骤。最近有学者提出了一些单阶段(one-stage)旋转目标检测方法,例如R3Det(refined rotation RetinaNet)[
针对RoI Transformer对多尺度遥感图像旋转目标检测精度不足的问题,本文提出了HRD-ROI Transformer (HRNet + KLD ROI Transformer)方法。首先,采用原始的RoI Transformer检测框架获取RRoI,用于鲁棒的几何特征提取;其次,使用HRNet[
1 HRD-ROI Transformer
HRD-ROI Transformer使用RoI Transformer作为基本框架。其采用HRNet作为骨干网络,将高分辨率卷积和低分辨率卷积流并行连接,可在保持高分辨率特征提取的前提下提升模型对多尺度目标检测的适应能力。KLD损失用来代替Smooth L1损失,解决度目标表示周期性带来的角度边界不连续性和类正方形问题。
1.1 检测网络整体架构
HRD-ROI Transformer的整体架构如
Figure 2.Structure diagram of HRD-ROI Transformer
特征提取模块 采用带有特征金字塔的HRNet提取多层高分辨率特征(见1.2节)。
RPN模块 RPN模块将任意大小的特征图作为输入,生成一系列粗略的HRoIs。
RoI Transformer模块 RoI Transformer模块用于从HRoIs的特征图中生成RRoIs。首先,通过RoI Pooling或RoI Align对不同大小的HRoIs进行RoI提取,得到固定大小(默认为7×7)的RoI特征,然后将每个HRoI特征输入到全连接层中,并对其进行解码,得到相应的粗略RRoIs。
基于KLD损失的RCNN模块 类似于RoI Transformer模块,通过旋转RoI Pooling、旋转RoI warping或旋转RoI Align将不同尺寸的RRoIs进行旋转,RoI提取得到固定尺寸的RoI特征,再输入到全连接层进行分类和更加精细的边界框回归,其中以KLD损失调整边界框回归的结果,最终输出结果。
1.2 高分辨率网络
为了提升检测网络对不同尺度目标的适应性,本文采用高分辨率网络HRNet代替ResNet 作为骨干网络。HRNet的基本结构如
Figure 3.Structure diagram of HRNet[
该模型的主要特点是整个过程中特征图始终保持高分辨率,通过在高分辨率特征图主网络中逐渐并行加入低分辨率特征图子网络,不断进行不同网络分支之间的信息交互,同时保持强语义信息和精准位置信息。在RoI Transformer网络的基本结构中,FPN(feature pyramid networks)作为特征提取中重要的一个环节,是将低分辨率强语义的深层特征和高分辨率弱语义的浅层特征通过一种自上而下的方式进行特征融合,使得不同层次的特征增强[
1.3 基于KLD的参数联合优化
尽管RoI Transformer方法在旋转目标检测中具有良好的效率和精度,但由于其旋转目标表示方式带来的角度周期性,会存在角度边界不连续性(
Figure 4.Schematic diagram of angle boundary discontinuity
Figure 5.Schematic diagram of square-like problem
1.3.1 旋转目标表示的角度周期性
对于类正方形的目标(如
1.3.2 KLD损失
为解决ROI Transformer原有的目标表示方式存在角度周期性问题,本文在RoI Transformer框架中引入KLD损失。首先,将目标表示的旋转框
式中:
属性1:
属性2:
属性3:
根据属性1,旋转目标的OpenCV表示方法造成的长短边的交换问题得以避免。根据属性2和3,旋转目标的长边定义法造成的类正方形问题也可以得到解决。综上,角度周期性因高斯分布的三角函数表示方式得以避免,表现出边界连续性。
预测框和真值对应的高斯分布
显然,
最后,为了保证评估测度和回归损失之间的一致性,采用非线性变换将
式中:
上述分析表明,基于KLD的损失可以保证旋转框参数
2 实验和讨论
2.1 数据集
本文使用带有旋转目标标签的DOTA v1.0[
2.2 评估标准
本文的目标检测结果主要采用精度 (precision, P)、召回率(recall, R)、平均精度均值 (mAP)、检测速度作为评价标准。精度及召回率公式如下:
式中:
2.3 实现细节
实验基于i9-10920X 处理器,使用4个NVIDIA GeForce RTX-2080Ti GPU,内存为256 GB,利用mmrotate平台[
对于DOTAv1.0数据集,本文将所有训练集和验证集的原始图像以824的步长裁剪出1 024×1024像素大小的图像块(其中为避免目标在切割图像时被分割,保留图像重叠度为200)。对于DIOR-R数据集,图像大小保持800 × 800像素的原始大小。
训练集的图像块通过一组图像归一化、随机翻转、随机裁剪等数据增强预处理方式之后,输入到模型中用于训练。在DOTAv1.0数据集的实验中,使用训练集对模型进行训练,使用验证集对模型进行评价。对于DIOR-R数据集,则使用训练验证集进行训练,使用测试集对模型进行评价。
2.4 实验结果分析
Method | Backbone | Loss | AP/% | mAP/% | ||||||||||||||
PL | BD | BR | GTF | SV | LV | SH | TC | BC | ST | SBF | RA | HA | SP | HC | ||||
Rotated RetinaNet | ResNet50 | Smooth L1 | 89.7 | 75.0 | 40.8 | 64.1 | 66.5 | 67.7 | 85.8 | 90.7 | 62.6 | 65.7 | 54.4 | 62.0 | 62.6 | 52.2 | 54.5 | 66.3 |
R3Det | ResNet50 | Smooth L1 | 89.5 | 73.2 | 44.4 | 65.3 | 66.9 | 77.2 | 87.2 | 57.9 | 66.2 | 51.3 | 63.2 | 72.1 | 53.0 | 67.5 | ||
S2ANet | ResNet50 | Smooth L1 | 89.0 | 73.8 | 43.6 | 67.1 | 64.9 | 74.2 | 79.1 | 90.5 | 62.7 | 66.3 | 56.8 | 64.8 | 61.2 | 54.2 | 42.0 | 66.0 |
SASM reppoints | ResNet50 | GIoU | 89.5 | 76.0 | 45.3 | 70.7 | 59.9 | 74.6 | 78.0 | 90.3 | 64.1 | 67.3 | 46.2 | 67.1 | 70.3 | 56.3 | 44.3 | 66.7 |
Oriented reppoints | ResNet50 | GIoU | 89.7 | 75.7 | 49.8 | 70.7 | 80.5 | 88.4 | 90.5 | 65.1 | 68.6 | 47.1 | 64.6 | 70.4 | 57.8 | 69.8 | ||
Rotated Faster RCNN | ResNet50 | Smooth L1 | 88.5 | 74.7 | 44.1 | 70.0 | 63.7 | 71.4 | 79.4 | 90.5 | 58.7 | 62.0 | 54.7 | 64.5 | 63.2 | 58.2 | 50.1 | 66.3 |
Oriented RCNN | ResNet50 | Smooth L1 | 89.1 | 75.8 | 50.0 | 68.3 | 62.3 | 88.8 | 90.6 | 68.7 | 62.3 | 57.0 | 63.6 | 66.4 | 57.3 | 39.1 | 68.2 | |
RoI Transformer | ResNet50 | Smooth L1 | 89.4 | 77.7 | 46.8 | 71.9 | 68.4 | 77.9 | 80.0 | 90.7 | 71.3 | 62.5 | 59.1 | 63.6 | 67.3 | 60.2 | 45.4 | 68.8 |
ReDet | ReResNet50 | Smooth L1 | 89.6 | 47.4 | 68.8 | 65.8 | 82.4 | 87.4 | 90.6 | 67.5 | 63.4 | 65.9 | 67.3 | 53.0 | 48.7 | 69.7 | ||
Ours | HRNet | KLD | 75.4 | 68.8 | 78.6 | 90.7 | 62.8 | 52.1 |
Table 1. Performance comparison of different methods on DOTAv1.0 dataset
RoI Transformer[
本文用DIOR-R数据集评估HRD-ROI Transformer模型的适应性。根据DIOR-R数据集的特性,将用于DOTAv1.0数据集模型的的输入图像大小调整为800 × 800像素,检测目标类别调整为20,并使用DIOR-R数据集重新训练和测试模型。结果如
Method | Backbone | Loss | AP/% | mAP/% | |||||||||||||||||||
APL | APO | BF | BC | BR | CH | ESA | ETS | DAM | GF | GTF | HA | OP | SH | STA | STO | TC | TS | VE | WM | ||||
Rotated Retinanet | ResNet50 | Smooth L1 | 59.1 | 15.0 | 70.4 | 81.1 | 14.5 | 72.6 | 64.9 | 46.6 | 14.6 | 70.9 | 74.7 | 24.8 | 30.2 | 67.0 | 69.1 | 50.1 | 81.2 | 41.6 | 32.5 | 61.9 | 52.1 |
Rotated Retinanet-G | ResNet50 | GWD | 64.6 | 21.1 | 72.9 | 81.1 | 13.1 | 72.7 | 68.5 | 45.8 | 14.7 | 70.1 | 75.1 | 27.2 | 30.6 | 68.9 | 66.1 | 57.9 | 81.2 | 47.4 | 34.8 | 61.5 | 53.8 |
R3Det | ResNet50 | Smooth L1 | 53.3 | 27.9 | 68.9 | 81.0 | 22.9 | 72.6 | 66.4 | 49.6 | 19.2 | 68.4 | 76.0 | 22.1 | 41.5 | 68.3 | 57.9 | 55.4 | 81.1 | 45.5 | 35.7 | 54.0 | 53.4 |
R3Det-K | ResNet50 | KLD | 57.8 | 34.9 | 69.4 | 81.2 | 28.5 | 72.7 | 71.8 | 53.2 | 16.1 | 71.8 | 77.1 | 36.4 | 47.6 | 74.5 | 62.5 | 60.8 | 81.3 | 50.0 | 39.8 | 56.2 | 57.2 |
S2ANet | ResNet50 | KFIoU | 67.2 | 28.0 | 76.0 | 80.8 | 27.3 | 72.6 | 61.2 | 60.3 | 17.9 | 68.6 | 78.2 | 26.2 | 44.6 | 77.7 | 65.8 | 67.4 | 81.3 | 48.9 | 42.2 | 63.1 | 57.8 |
SASM reppoints | ResNet50 | GIoU | 61.2 | 74.5 | 82.7 | 32.4 | 72.5 | 76.0 | 58.1 | 34.9 | 71.3 | 77.1 | 38.6 | 51.5 | 79.1 | 64.8 | 66.3 | 80.7 | 60.5 | 41.7 | 64.2 | 62.0 | |
Oriented reppoints | ResNet50 | GIoU | 68.7 | 41.9 | 75.1 | 84.0 | 35.4 | 75.4 | 78.6 | 51.8 | 80.3 | 66.5 | 66.4 | 85.4 | 46.2 | 65.0 | 63.5 | ||||||
Rotated Faster RCNN | ResNet50 | Smooth L1 | 62.0 | 18.1 | 71.3 | 81.0 | 22.9 | 72.5 | 61.0 | 58.5 | 10.0 | 67.6 | 78.8 | 34.3 | 38.9 | 80.4 | 58.8 | 62.4 | 81.3 | 44.7 | 41.3 | 64.3 | 55.5 |
Oriented RCNN | ResNet50 | Smooth L1 | 61.8 | 26.7 | 71.6 | 81.3 | 33.8 | 72.6 | 74.0 | 58.4 | 23.7 | 66.8 | 80.0 | 29.9 | 52.0 | 81.0 | 62.5 | 62.4 | 81.4 | 50.6 | 42.3 | 65.0 | 58.9 |
RoI Transformer | ResNet50 | Smooth L1 | 63.1 | 30.7 | 71.8 | 81.5 | 33.9 | 75.8 | 64.6 | 24.3 | 67.4 | 82.5 | 35.7 | 51.1 | 81.2 | 70.5 | 81.5 | 44.4 | 43.4 | 66.0 | 60.7 | ||
ReDet | ReResNet50 | Smooth L1 | 28.3 | 71.5 | 88.7 | 31.3 | 71.6 | 61.1 | 20.8 | 61.8 | 81.9 | 36.7 | 48.8 | 81.1 | 63.1 | 62.5 | 81.6 | 49.2 | 42.8 | 64.6 | 59.6 | ||
Ours | HRNet | KLD | 63.1 | 41.6 | 72.6 | 76.6 | 65.8 | 28.2 | 71.0 | 42.2 | 70.4 | 53.3 |
Table 2. Performance comparison of different methods on DIOR-R dataset
尽管ReDet采用ReResNet提取旋转不变特征,但它的高分辨率特征语义信息很弱,对于小目标的检测效果不佳。而本文方法中使用的HRNet保持了高分辨率表示,保持强语义信息的同时,提高了网络对各种尺度目标的鲁棒性。如
Method | Backbone | Loss | DIOR-R/% | DOTAv1.0/% | ||||
SH | VE | WM | SV | SH | ||||
Rotated RetinaNet | ResNet50 | Smooth L1 | 67.0 | 32.5 | 61.9 | 66.5 | 85.8 | |
R3Det | ResNet50 | Smooth L1 | 68.3 | 35.7 | 54.0 | 66.9 | 87.2 | |
S2ANet | ResNet50 | Smooth L1 | 77.7 | 42.2 | 63.1 | 64.9 | 79.1 | |
SASM reppoints | ResNet50 | GIoU | 79.1 | 41.7 | 64.2 | 59.9 | 78.0 | |
Oriented reppoints | ResNet50 | GIoU | 80.3 | 46.2 | 65.0 | 88.4 | ||
Rotated Faster RCNN | ResNet50 | Smooth L1 | 80.4 | 41.3 | 64.3 | 63.7 | 79.4 | |
Oriented RCNN | ResNet50 | Smooth L1 | 81.0 | 42.3 | 65.0 | 62.3 | 88.8 | |
RoI Transformer | ResNet50 | Smooth L1 | 81.2 | 43.4 | 66.0 | 68.4 | 80.0 | |
ReDet | ReResNet50 | Smooth L1 | 81.1 | 42.8 | 64.6 | 65.8 | 87.4 | |
Ours | HRNet | KLD | 68.8 |
Table 3. Detection effects of small object on DOTAv1.0 and DIOR-R datasets
Figure 6.Comparison of detection results (false detection)
Figure 7.Comparison of detection results (missed detection)
此外,RoI Transformer对于大长宽比的目标定位不够精准。如
Figure 8.Comparison of detection results (objects of large aspect ratios)
2.5 消融实验
本文利用消融实验分别测试KLD损失函数和HRNet对模型性能的影响,并对比了GWD、KLD和KFIoU 3种用于旋转目标检测的损失函数的性能。
模型(a)是RoI Transformer框架中仅以KLD损失函数替换Smooth L1损失函数,模型(b)是RoI Transformer框架融合HRNet特征提取网络,模型(c)即为本文提出的HRD-ROI Transformer方法。
Method | KLD | HRNet | mAP/% |
Rotated Faster RCNN | 66.3 | ||
RoI Transformer | 68.8 | ||
Ours(a) | √ | 70.3 | |
Ours(b) | √ | 71.7 | |
Ours(c) | √ | √ |
Table 4. Comparison of effectiveness of KLD and HRNet on DOTAv1.0 dataset
在DOTAv1.0数据集上,RoI Transformer的mAP达到68.8%,而仅使用KLD损失的模型(a)达到了70.3%,仅使用HRNet的模型(b)达到了71.7%,相比于RoI Transformer分别提升了1.5%和2.9%。这表明这两个部分对于最终的检测结果都有贡献。结合KLD损失和HRNet的模型(c)的mAP进一步达到了72.5%。上述结果充分验证了基于KLD损失和HRNet的有效性。模型(a)、(b)和(c)在DIOR-R数据集上的mAP分别比原始RoI Transformer高0.8%、3.2%和4%,也验证了本文模型的适应性。
Method | KLD | HRNet | mAP/% |
Rotated Faster RCNN | 55.5 | ||
RoI Transformer | 60.7 | ||
Ours(a) | √ | 61.5 | |
Ours(b) | √ | 63.9 | |
Ours(c) | √ | √ |
Table 5. Comparison of effectiveness of KLD and HRNet on DIOR-R dataset
模型(a)和RoI Transformer的检测结果对比如
Figure 9.Effectiveness of KLD on DIOR-R dataset
GWD、KLD和KFIoU 3种损失函数的性能对比如
Loss Function | DOTAv1.0/% | DIOR-R/% |
GWD | 69.2 | 61.4 |
KFIOU | 68.9 | 60.3 |
KLD |
Table 6. Comparison of mAP for three loss function models
2.6 HRD-ROI Transformer误检样本分析
Figure 10.Detection results of airport
Figure 11.Detection results of golf course
3 结论
本文提出了一种基于RoI Transformer的遥感图像多尺度旋转目标检测方法HRD-ROI Transformer,该方法采用HRNet作为骨干网络,提高了模型对目标尺度变化的适应性,在小目标检测效果上优于现有典型旋转目标检测方法;此外,本文所提方法引入KLD损失,可对旋转边界框参数进行联合优化,提高了模型对旋转目标,特别是大长宽比旋转目标的检测精度。在两个公共数据集的的比较试验证明了HRD-ROI Transformer可以适应目标尺度变化,并解决了角度周期性问题,在旋转目标的检测精度方面优于当前主流的方法。
本文方法对DIOR-R数据集中的机场(APO)和高尔夫球场(GF)检测效果欠佳,后续将根据这类目标的特性做数据增强,并将SAM(segmenting anything model)嵌入检测模型中[
References
[1] L LIU, W OUYANG, X G WANG et al. Deep learning for generic object detection: a survey. International Journal of Computer Vision, 128, 261-318.(2020).
[2] Changhong FU, Kunhui CHEN, Kunhan LU et al. Aviation fastener rotation detection for intelligent optical perception with edge computing. Journal of Applied Optics, 43, 472-480(2022).
[3] J DING, N XUE, Y LONG et al. Learning RoI transformer for oriented object detection in aerial images, 2849-2858(2019).
[4] W QIAN, X YANG, S L PENG et al. Learning modulated loss for rotated object detection, 2458-2466(2021).
[5] J Q MA, W Y SHAO, H YE et al. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20, 3111-3122.(2018).
[6] X X XIE, G CHENG, J B WANG et al. Oriented r-cnn for object detection, 3520-3529(2021).
[7] J M HAN, J DING, N XUE et al. Redet: a rotation-equivariant detector for aerial object detection, 2786-2795(2021).
[8] K M HE, X Y ZHANG, S Q REN et al. Deep residual learning for image recognition, 770-778(2016).
[9] X YANG, J C YAN, Q MING et al. Rethinking rotated object detection with Gaussian Wasserstein distance loss, 11830-11841(2021).
[11] X YANG, J C YAN, Z M FENG et al. R3det: refined single-stage detector with feature refinement for rotating object, 3163-3171(2021).
[12] L HOU, K LU, J XUE et al. Shape-adaptive selection and measurement for oriented object detection, 923-932(2022).
[13] W LI, Y CHEN, K HU et al. Oriented reppoints for aerial object detection, 1829-1838(2022).
[14] Liequan WU, Zhifeng ZHOU, Zhiling ZHU et al. Surface defect detection of patch diode based on improved YOLO-V4. Journal of Applied Optics, 44, 621-627(2023).
[15] X YANG, X J YANG, J R YANG et al. Learning high-precision bounding box for rotated object detection via kullback leibler divergence, 18381-18394(2021).
[18] J D WANG, K SUN, T S CHENG et al. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3349-3364.(2021).
[19] Jiale CAO, Yali LI, Hanqing SUN et al. A survey on deep learning based visual object detection. Journal of Image and Graphics, 27, 1697-1722(2022).
[20] X YANG, J C YAN, W L LIAO et al. SCRDet++: detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 2384-2399.(2023).
[21] J HAN, J DING, J LI et al. Align deep features for oriented object detection. IEEE Transactions on Geoscience and Remote Sensing, 60, 1-11(2022).
[22] G S XIA, X BAI, J DING et al. Dota: a large-scale dataset for object detection in aerial images, 3974-3983(2018).
[23] G CHENG, J B WANG, K LI et al. Anchor-free oriented proposal generator for object detection. IEEE Transactions on Geoscience and Remote Sensing, 60, 1-11(2022).
[24] K LI, G WAN, G CHENG et al. Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS Journal of Photogrammetry and Remote Sensing, 159, 296-307(2020).
[25] Y ZHOU, X YANG, G F ZHANG et al. Mmrotate: a rotated object detection benchmark using pytorch, 7331-7334(2022).
[28] J LI, Y X GONG, Z MA et al. Enhancing feature fusion using attention for small object detection, 1859-1863(2022).
[29] Y YUAN, Y L ZhANG. OLCN: an optimized low coupling network for small objects detection. IEEE Geoscience and Remote Sensing Letters, 19, 1-5(2021).

Set citation alerts for the article
Please enter your email address