Fig. 1. Comparison of effects of four data augmentation techniques
Fig. 2. Comparison of combination effects of flipping, cropping, and rotating methods
Fig. 3. Node representations of cell structures of ResNet, DensNet, and two-path networks. (a) ResNet network; (b) DensNet network; (c) two-path network
Fig. 4. Examples of traditional NMS problems. (a) Horses; (b) birds
Fig. 5. Trend of parameter quantity of feature extraction network with Top-1 error rate and 52, 100, and 133 layers
Fig. 6. Trend of parameter quantity of feature extraction network with Top-1 error rate and network growth rates of 12, 18, 24, and 48
Layer | Output size | Detail |
---|
Conv1 | 112×112 | 7×7,64,stride 2 | Conv2 | 56×56 | 3×3 max pool,stride 2×α1 | Conv3 | 28×28 | ×α2 | Conv4 | 14×14 | ×α3 | Conv5 | 7×7 | ×α4 |
|
Table 1. Structure of feature extraction network
Feature extraction network | Depth | Parameter /106 |
---|
VGG-16 | 16 | 168 | DensNet(K=48) | 161 | 111 | ResNet | 101 | 150 | Ours(α1α2α3α4=6,8,16,3; K=48) | 100 | 134 |
|
Table 2. Comparison of complexity of different feature extraction networks
Different parameter | A | A | A | A | A | A |
---|
Normal NMS | 44.37 | 44.83 | 39.18 | 39.67 | 29.83 | 30.34 | β=2.5, σ=0.4 | 46.42 | 46.92 | 42.83 | 43.40 | 34.68 | 35.24 | β=1.67, σ=0.6 | 46.58 | 47.11 | 43.30 | 43.79 | 35.21 | 35.76 | β=1.25, σ=0.8 | 45.93 | 46.45 | 41.68 | 42.21 | 33.01 | 33.53 |
|
Table 3. Influences of IoU threshold, β parameter, and weighted average on AP (0.5, 0.6, and 0.7 represent different IoU thresholds; w represents weighted average)
Detection framwork | Backbone | Training set | Testing set | mAP /% |
---|
OursNo augmentation No improved NMS | ProposedProposedProposed | VOC2007+VOC2012VOC2007+VOC2012VOC2007+VOC2012 | VOC2007VOC2007VOC2007 | 79.176.678.0 |
|
Table 4. Influences of data augmentation and improved NMS mechanism on accuracy
Nums of epoch | Learning rate setting | mAP /% |
---|
0 | No warming up | 78.20 | 2 | 0.01, 0.1 | 78.25 | 345 | 0.001, 0.01, 0.10.0001, 0.001, 0.01, 0.10.00001, 0.0001, 0.001, 0.01, 0.1 | 78.3678.6778.71 |
|
Table 5. Influences of different epochs on accuracy
Method | Backbone | Training set | Testing set | mAP/% | Frame rate /(frame·s-1) |
---|
Twostage | Fast R-CNNFaster R-CNNFaster R-CNNMR-CNNIONOurs | VGG-16VGG-16ResNet-101ResNet-101VGG-16Proposed | VOC2007+VOC2012VOC2007+VOC2012VOC2007+VOC2012VOC2007+VOC2012VOC2007+VOC2012VOC2007+VOC2012 | VOC2007VOC2007VOC2007VOC2007VOC2007VOC2007 | 70.073.276.478.276.579.1 | 0.5072.400.031.252.10 | Onestage | YOLOYOLOv2SSD321SSD300*DSOD300DSSD513 | GoogleNetDarknet-19ResNet-101VGG-16DS/64-192-48-1ResNet-101 | VOC2007+VOC2012VOC2007+VOC2012VOC2007+VOC2012VOC2007+VOC2012VOC2007+VOC2012VOC2007+VOC2012 | VOC2007VOC2007VOC2007VOC2007VOC2007VOC2007 | 63.478.677.177.277.781.5 | 454011.204617.405.50 |
|
Table 6. Testing results of different algorithms under VOC2007+VOC2012 training sets