Backbone Network for Object Detection Task

Yalin Song^* and Yanwei Pang

Author Affiliations

School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China

show less

DOI: 10.3788/LOP57.041021 Cite this Article Set citation alerts

Yalin Song, Yanwei Pang. Backbone Network for Object Detection Task[J]. Laser & Optoelectronics Progress, 2020, 57(4): 041021 Copy Citation Text

show less

Fig. 1. Network architecture

Fig. 2. Initial module

Fig. 3. Feature fusion module

Fig. 4. Mix down-sampling module

Fig. 5. Prediction modules. (a) Plain prediction module; (b) dense prediction module

Fig. 6. Qualitative detection results

Table 1. Comparison of different initial modules

Table 2. Comparison of different feature fusion methods

Table 3. Comparison of different down-sampling modules

Prediction module	M_AP /%	Speed /(frame·s^-1)
Plain prediction module	80.1	89
Dense prediction module	81.0	85

Table 4. Comparison of different prediction modules

Backbonenetwork	Depth	Pre-train	SSD		DSOD
Backbonenetwork	Depth	Pre-train	M_AP /%	Speed /(frame·s^-1)		M_AP /%	Speed /(frame·s^-1)	M_AP /%	Speed /(frame·s^-1)
VGG	16	√	77.5		130	78.1	79	78.9	81
VGGBN	16	×	79.5		95	79.5	89	79.9	71
ResNet	101	×	76.0		42	75.5	42	77.1	38
DenseNet	121	×	74.6		37	75.1	32	75.3	29
DS/64-192-48-1	67	×	78.5		51	78.8	47	79.4	42
Root-ResNet-34	34	×	80.2		79	80.6	75	81.3	61
DNet	25	×	80.1		89	81.0	85	80.5	65

Table 5. Detection resultsof different backbone networks in SSD, DSOD, and RFBNet models

Method	Pre-train	Backbone network	Input size /(pixel×pixel)	M_AP /%	Speed /(frame·s^-1)
SSD^[11]	√	VGG-16	300×300	77.2	46
SSD^*	√	VGG-16	300×300	77.7	130
YOLOv2^[26]	√	DarkNet-19	544×544	78.6	81
RFBNet^[25]	√	VGG-16	300×300	80.5	83
DSSD^[27]	√	ResNet-101	300×300	78.6	8
Faster R-CNN^[8]	√	ResNet-101	~1000×600	76.4	2.4
RFCN^[28]	√	ResNet-101	~1000×600	80.5	9
DSOD^[19]	×	DS/64-192-48-1	300×300	77.7	17.4
ScratchDet^[20]	×	Root-ResNet-34	300×300	80.4	17.8
Proposed	×	DNet	300×300	81.0	85