• Acta Photonica Sinica
  • Vol. 52, Issue 1, 0110002 (2023)
Ying SUN1,2, Zhiqiang HOU1,2,*, Chen YANG1,2, Sugang MA1,2, and Jiulun FAN1
Author Affiliations
  • 1School of Computer Science and Technology,Xi'an University of Posts & Telecommunications,Xi'an 710121,China
  • 2Shaanxi Provincial Key Laboratory of Network Data Analysis and Intelligent Processing,Xi'an 710121,China
  • show less
    DOI: 10.3788/gzxb20235201.0110002 Cite this Article
    Ying SUN, Zhiqiang HOU, Chen YANG, Sugang MA, Jiulun FAN. Object Detection Algorithm Based on Dual-modal Fusion Network[J]. Acta Photonica Sinica, 2023, 52(1): 0110002 Copy Citation Text show less
    Overall algorithm architecture
    Fig. 1. Overall algorithm architecture
    Dual-mode encoder structure
    Fig. 2. Dual-mode encoder structure
    Gated fusion network structure
    Fig. 3. Gated fusion network structure
    P-R curves of the two models with different modal inputs
    Fig. 4. P-R curves of the two models with different modal inputs
    Detection results on the KAIST dataset
    Fig. 5. Detection results on the KAIST dataset
    Detection results on the GIR dataset
    Fig. 6. Detection results on the GIR dataset
    AlgorithmResolutionAP0.5:0.95AP0.5
    Ours-n416×41630.570
    Ours-n512×51232.573.1
    Ours-n608×60832.973.3
    Ours-n640×64033.373.8
    Table 1. Detector performance for different input image pairs sizes on n-model
    AlgorithmResolutionAP0.5:0.95AP0.5
    Ours-s416×41631.171
    Ours-s512×51231.972.7
    Ours-s608×60834.373.9
    Ours-s640×64035.274.5
    Table 2. Detector performance for different input image pairs sizes on s-model
    MethodEncoder-VSEncoder-IRGated FusionInputAP0.5:0.95AP0.5FPS
    YOLOv5-nVS24.858.7158.7
    YOLOv5-nIR31.671158.7
    YOLOv5-n-EVSVS2559.1125
    YOLOv5-n-EIRIR31.871.3125
    Ours-nVS+IR33.373.8117.6
    YOLOv5-sVS26.759.8112.4
    YOLOv5-sIR3271.5112.4
    YOLOv5-s-EVSVS26.960.2107.5
    YOLOv5-s-EIRIR32.271.9107.5
    Ours-sVS+IR35.274.5102
    Table 3. Ablation experimental results of different models on the KAIST dataset
    MethodEncoder-VSEncoder-IRGating FusionInputAP0.5:0.95AP0.5FPS
    YOLOv5-nVS48.488.8158.7
    YOLOv5-nIR36.375.5158.7
    YOLOv5-n-EVSVS49.489.1105.3
    YOLOv5-n-EIRIR36.476.3105.3
    Ours-nVS+IR49.789.8101
    YOLOv5-sVS51.489.9111.1
    YOLOv5-sIR36.676.8111.1
    YOLOv5-s-EVSVS51.990.191.7
    YOLOv5-s-EIRIR36.77791.7
    Ours-sVS+IR52.290.585.5
    Table 4. Ablation experimental results of different models on the GIR dataset
    ClassOurs-nYOLOv5-n-VSYOLOv5-n-IROurs-sYOLOv5-s-VSYOLOv5-s-IR
    Person90.791.284.091.791.785.4
    Dog99.599.599.599.599.591.6
    Car95.495.294.395.895.194.7
    Bicycle80.483.770.780.884.772.8
    Plant85.684.579.186.487.076.0
    Motorcycle82.882.076.183.982.477.7
    Umbrella86.087.870.585.786.676.1
    Kite93.682.964.694.489.267.6
    Toy95.696.386.796.497.083.7
    Ball88.584.729.590.785.542.1
    Table 5. The detection accuracy of the proposed algorithm and the baseline algorithm(AP0.5%)
    InputAlgorithmBackboneResolutionAP0.5:0.95AP0.5FPS
    VSFaster R-CNN(2015)ResNet-501 000×60024.258.315.2
    SSD(2016)VGG-16512×51218.148.238.1
    RetinaNet(2017)ResNet-501 333×80022.557.716.6
    YOLOv3(2018)DarkNet-53416×41618.346.756.2
    FCOS(2019)ResNet-501 333×80022.756.718.3
    ATSS(2020)ResNet-501 333×80024.357.817
    YOLOv4(2020)CSPDarkNet-53416×41623.757.455
    YOLOX-s(2021)Modified CSP v5416×4162761.148.4
    YOLOX-m(2021)Modified CSP v5416×41627.761.840.3
    YOLOF(2021)ResNet-501 333×80022.254.125.7
    YOLOv5-n(2020)Modified CSP v5640×64024.858.7158.7
    YOLOv5-s(2020)Modified CSP v5640×64026.459.8112.4
    YOLOv5-n-EVSModified CSP v5640×6402559.1125
    YOLOv5-s-EVSModified CSP v5640×64026.960.2107.5
    IRFaster R-CNN(2015)ResNet-501 000×60028.868.612
    SSD(2016)VGG-16512×51223.260.934
    RetinaNet(2017)ResNet-501 333×80027.868.214.1
    YOLOv3(2018)DarkNet-53416×41625.363.637
    FCOS(2019)ResNet-501 333×80029.669.414
    ATSS(2020)ResNet-501 333×800296913.8
    YOLOv4(2020)CSPDarkNet-53416×41627.468.552.6
    YOLOX-s(2021)Modified CSP v5416×41632.872.145
    YOLOX-m(2021)Modified CSP v5416×41633.573.140
    YOLOF(2021)ResNet-501 333×80027.365.625
    YOLOv5-n(2020)Modified CSP v5640×64031.671158.7
    YOLOv5-s(2020)Modified CSP v5640×6403271.5112.4
    YOLOv5-n-EIRModified CSP v5640×64031.871.3125
    YOLOv5-s-EIRModified CSP v5640×64032.271.9107.5
    VS+IRMMTOD(2019)18ResNet-1011 000×60031.170.713.2
    CMDet(2021)37ResNet-101640×51228.368.425.3
    RISNet(2022)38DarkNet-53416×41633.172.723
    Ours-nModified CSP v5640×64033.373.8117.6
    Ours-sModified CSP v5640×64035.274.5102
    Table 6. Comparative experimental results on the KSIAT dataset
    InputAlgorithmBackboneResolutionAP0.5:0.95AP0.5FPS
    VSYOLOv3(2018)DarkNet-53416×41641.285.750
    FCOS(2019)ResNet-501 333×80040.48416
    ATSS(2020)ResNet-501 333×80047.187.114
    YOLOv4(2020)CSPDarkNet-53416×41644.587.953
    YOLOX-s(2021)Modified CSP v5416×41651.790.352
    YOLOv5-n(2020)Modified CSP v5640×64048.488.8158.7
    YOLOv5-s(2020)Modified CSP v5640×64051.489.8111.1
    YOLOv5-n-EVSModified CSP v5640×64049.489.1105.3
    YOLOv5-s-EVSModified CSP v5640×64051.990.191.7
    IRYOLOv3(2018)DarkNet-53416×41635.674.248.4
    FCOS(2019)ResNet-501 333×80034.572.312
    ATSS(2020)ResNet-501 333×80035.273.411.7
    YOLOv4(2020)CSPDarkNet-53416×41635.874.749
    YOLOX-s(2021)Modified CSP v5416×41636.976.353
    YOLOv5-n(2020)Modified CSP v5640×64036.375.5158.7
    YOLOv5-s(2020)Modified CSP v5640×64036.676.8111.1
    YOLOv5-n-EIRModified CSP v5640×64036.476.3105.3
    YOLOv5-s-EIRModified CSP v5640×64036.77791.7
    VS+IRMMTOD(2019)18ResNet-1011 000×60040.784.311.2
    CMDet(2021)37ResNet-101640×51248.688.922.7
    RISNet(2022)38DarkNet-53416×41649.389.223.3
    Ours-nModified CSP v5640×64049.789.8101
    Ours-sModified CSP v5640×64052.290.585.5
    Table 7. Comparative experimental results on the GIR dataset
    Ying SUN, Zhiqiang HOU, Chen YANG, Sugang MA, Jiulun FAN. Object Detection Algorithm Based on Dual-modal Fusion Network[J]. Acta Photonica Sinica, 2023, 52(1): 0110002
    Download Citation