• Laser & Optoelectronics Progress
  • Vol. 57, Issue 4, 041021 (2020)
Yalin Song* and Yanwei Pang
Author Affiliations
  • School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
  • show less
    DOI: 10.3788/LOP57.041021 Cite this Article Set citation alerts
    Yalin Song, Yanwei Pang. Backbone Network for Object Detection Task[J]. Laser & Optoelectronics Progress, 2020, 57(4): 041021 Copy Citation Text show less
    Network architecture
    Fig. 1. Network architecture
    Initial module
    Fig. 2. Initial module
    Feature fusion module
    Fig. 3. Feature fusion module
    Mix down-sampling module
    Fig. 4. Mix down-sampling module
    Prediction modules. (a) Plain prediction module; (b) dense prediction module
    Fig. 5. Prediction modules. (a) Plain prediction module; (b) dense prediction module
    Qualitative detection results
    Fig. 6. Qualitative detection results
    Initial blockMAP /%Speed /(frame·s-1)
    7×7-s279.985
    3×3 conv-s1,3×3conv-s1,3×3 conv-s2,80.785
    3×3 conv-s1,3×3conv-s2,3×3 conv-s1,80.684
    3×3 conv-s2,3×3conv-s1,3×3 conv-s181.085
    Table 1. Comparison of different initial modules
    Feature fusion methodMAP /%Speed /(frame·s-1)
    Without fusion80.491
    Sum79.896
    Concatenation+1×1 conv81.085
    Table 2. Comparison of different feature fusion methods
    Down-sampling moduleMAP /%Speed /(frame·s-1)
    3×3 conv-s279.881
    2×2 max pool-s279.884
    Mix down-sampling81.085
    Table 3. Comparison of different down-sampling modules
    Prediction moduleMAP /%Speed /(frame·s-1)
    Plain prediction module80.189
    Dense prediction module81.085
    Table 4. Comparison of different prediction modules
    BackbonenetworkDepthPre-trainSSDDSODRFBNet
    MAP /%Speed /(frame·s-1)MAP /%Speed /(frame·s-1)MAP /%Speed /(frame·s-1)
    VGG1677.513078.17978.981
    VGGBN16×79.59579.58979.971
    ResNet101×76.04275.54277.138
    DenseNet121×74.63775.13275.329
    DS/64-192-48-167×78.55178.84779.442
    Root-ResNet-3434×80.27980.67581.361
    DNet25×80.18981.08580.565
    Table 5. Detection resultsof different backbone networks in SSD, DSOD, and RFBNet models
    MethodPre-trainBackbone networkInput size /(pixel×pixel)MAP /%Speed /(frame·s-1)
    SSD[11]VGG-16300×30077.246
    SSD*VGG-16300×30077.7130
    YOLOv2[26]DarkNet-19544×54478.681
    RFBNet[25]VGG-16300×30080.583
    DSSD[27]ResNet-101300×30078.68
    Faster R-CNN[8]ResNet-101~1000×60076.42.4
    RFCN[28]ResNet-101~1000×60080.59
    DSOD[19]×DS/64-192-48-1300×30077.717.4
    ScratchDet[20]×Root-ResNet-34300×30080.417.8
    Proposed×DNet300×30081.085
    Table 6. Detection results of different detectors on the PASCAL VOC dataset
    Yalin Song, Yanwei Pang. Backbone Network for Object Detection Task[J]. Laser & Optoelectronics Progress, 2020, 57(4): 041021
    Download Citation