Author Affiliations
1School of Microelectronics, Tianjin University, Tianjin 300072, China2Phytium Technology Co., Ltd., Tianjin 300459, China3Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, Tianjin 300072, Chinashow less
Fig. 1. Overall network architecture
Fig. 2. Deformable receptive field block
Fig. 3. Ghost bottleneck module
Fig. 4. Comparison result of P-R curve. (a) P-R curves of the proposed model for ten classes of objects;(b)P-R curve for the bus;(c)P-R curve for the car
Fig. 5. Comparison of detection results of different models in different scenarios. (a) Multi-scale, occluded scene; (b) small object, dense scene; (c) illumination change scene
Fig. 6. Detection results of the proposed model on NWPU VHR-10 dataset
Input size | Operator | Exp size | Output size | SE | NL | s |
---|
640×640×3 | Conv2d | | 320×320×16 | | HS | 2 | 320×320×16 | Bneck,3×3 | 16 | 320×320×16 | | RE | 1 | 320×320×16 | Bneck,3×3 | 64 | 160×160×24 | | RE | 2 | 160×160×24 | Bneck,3×3 | 72 | 160×160×24 | | RE | 1 | 160×160×24 | Bneck,5×5 | 72 | 80×80×40 | 1 | RE | 2 | 80×80×40 | Bneck,5×5 | 120 | 80×80×40 | 1 | RE | 1 | 80×80×40 | Bneck,5×5 | 120 | 80×80×40 | 1 | RE | 1 | 80×80×40 | Bneck,3×3 | 240 | 40×40×80 | | HS | 2 | 40×40×80 | Bneck,3×3 | 200 | 40×40×80 | | HS | 1 | 40×40×80 | Bneck,3×3 | 184 | 40×40×80 | | HS | 1 | 40×40×80 | Bneck,3×3 | 184 | 40×40×80 | | HS | 1 | 40×40×80 | Bneck,3×3 | 480 | 40×40×112 | 1 | HS | 1 | 40×40×112 | Bneck,3×3 | 672 | 40×40×112 | 1 | HS | 1 | 40×40×112 | Bneck,5×5 | 672 | 20×20×160 | 1 | HS | 2 | 20×20×160 | Bneck,5×5 | 960 | 20×20×160 | 1 | HS | 1 | 20×20×160 | Bneck,5×5 | 960 | 20×20×160 | 1 | HS | 1 |
|
Table 1. Detailed structure of backbone network
Algorithm 1:simplify optimal transport assignment(SimOTA) |
---|
Input:n is the number of initial selected candidate boxes C,m is the number of ground truth objects in image Y,Pjclass is predicted class score for candidate box aj,Pjbox is predicted bounding box for aj(j=1,2,…, n),Giclass is ground truth class for ground truth gi,Gibox is bounding box for gi(i=1,2,…, m),ε=3 | Output:get k candidate boxes as positive samples of gi | 1 calculate class loss:Lijclass=BCELoss(Pjclass,Giclass) | 2 calculate regression loss:Lijreg=GIoULoss(Pjbox,Gibox) | 3 calculate cost:cij=Lijclass+εLijreg | 4 select the top10 candidate boxes with the highest IoU for each gi | 5 sum these 10 IoU and take integers to get the top k for each gi | 6 for i=1 to m do | 7 select the top k candidate boxes with the least cost within a fixed center region for gi | 8 if a candidate box aj matches multiple ground truths then select the least cost ground truth matching aj | 9 else aj is selected as a positive sample of gi |
|
Table 2. Implementation flow of SimOTA label assignment
Model | Backbone | AP /% | AP50 /% | AP75 /% | Parameters/106 | BFLOPs | Speed /(frame·s-1) |
---|
Faster R-CNN[7] | VGG16 | | 15.2 | | | | 20.4 | CenterNet[22] | ResNet50 | 12.4 | 22.7 | 12.4 | 32.67 | 246.01 | 45.2 | YOLOv4[21] | CSPDarknet53 | 16.8 | 31.2 | 16.7 | 64.36 | 321.30 | 28.8 | YOLOv4-tiny[21] | Tiny Darknet | 10.6 | 19.8 | 10.4 | 6.06 | 36.99 | 65.2 | Proposed model | MobileNetV3 | 15.1 | 26.6 | 15.5 | 7.79 | 37.35 | 59.9 |
|
Table 3. Comparison of evaluation results of different models on VisDrone dataset
MobileNetV3+ Decoupled Head | D-RFB | RFB | Ghost-PAN | PAN | SimOTA | Focalloss | AP50 /% | Speed /(frame·s-1) |
---|
√ | | | | | | | 18.6 | 72.3 | √ | √ | | | | | | 20.1 | 68.2 | √ | | √ | √ | | | | 22.6 | 61.2 | √ | √ | | | √ | | | 23.6 | 54.3 | √ | √ | | √ | | | | 23.4 | 59.9 | √ | √ | | √ | | √ | | 25.2 | 59.7 | √ | √ | | √ | | √ | √ | 26.6 | 59.9 |
|
Table 4. Results of ablation study
Model | Backbone | AP /% | AP50 /% | AP75 /% | Parameters /106 | BFLOPs | Speed /(frame·s-1) |
---|
Faster R-CNN[7] | VGG16 | | 81.8 | | | | 20.9 | CenterNet[22] | ResNet50 | 45.4 | 84.1 | 40.7 | 32.67 | 109.34 | 55.3 | YOLOv4[21] | CSPDarknet53 | 58.0 | 96.2 | 62.7 | 64.36 | 142.80 | 44.2 | YOLOv4-tiny[21] | Tiny Darknet | 29.8 | 72.9 | 18.0 | 6.06 | 16.44 | 84.3 | Proposed model | MobileNetV3 | 59.2 | 94.4 | 64.9 | 7.79 | 16.60 | 79.6 |
|
Table 5. Comparison of evaluation results of different models on NWPU VHR-10 dataset
Target category | Faster R-CNN | CenterNet | YOLOv4 | YOLOv4-tiny | Proposed model |
---|
mAP | 81.82 | 84.10 | 96.23 | 72.89 | 94.36 | Airplane | 97.71 | 99.81 | 99.79 | 99.30 | 99.95 | Baseball diamond | 94.14 | 91.87 | 95.81 | 90.71 | 98.63 | Basketball court | 78.38 | 78.45 | 98.39 | 65.47 | 89.49 | Bridge | 72.56 | 81.52 | 87.13 | 34.85 | 89.07 | Ground track field | 96.98 | 71.77 | 97.18 | 73.85 | 99.21 | Harbor | 84.01 | 75.08 | 96.75 | 49.18 | 89.68 | Ship | 72.80 | 87.67 | 96.52 | 90.37 | 93.02 | Storage tank | 81.83 | 92.01 | 96.91 | 86.49 | 96.65 | Tennis court | 83.44 | 85.33 | 99.95 | 77.12 | 96.40 | Vehicle | 56.17 | 77.51 | 93.89 | 61.54 | 91.47 |
|
Table 6. Evaluation result of different models on NWPU VHR-10 dataset for 10 classes of objects