Author Affiliations
Armament Launch Theory and Technology Key Discipline Laboratory of PRC, Rocket Force University of Engineering, Xi′an 710025, Chinashow less
Fig. 1. YOLO-IDSTD network structure. (a) Feature extraction part; (b) Feature fusion part; (c) Target detection part
Fig. 2. Structure of Focus
Fig. 3. Structure of PDSCP
Fig. 4. Improved RFB-Small block
Fig. 5. Some images of data set
Fig. 6. Typical infrared dim and small targets in data set
Fig. 7. Comparison of detection results of typical infrared dim and small targets. (a) YOLOv3; (b) YOLOv4-tiny; (c) YOLOv3-tiny; (d) YOLO-IDSTD
Fig. 8. Test results on OSU Thermal Pedistrian Database
Fig. 9. Test results on FLIR Thermal Datasets
No. | Name | Parameter | FLOPs | 1 | Focus, 1, 16 | 224×106 | 33.0×106 | 2 | Conv, 3/1, 16 | 2336×106 | 86.1×106 | 3 | Conv, 3/1, 32 | 4672×106 | 43.1×106 | 4 | Conv, 3/1, 64 | 18560×106 | 42.8×106 | 5 | PDSCP, 128 | 38016×106 | 21.9×106 | 6 | PDSCP, 256 | 149760×106 | 21.6×106 | 7 | PDSCP, 512 | 594432×106 | 21.4×106 |
|
Table 1. Each layer’s parameters and FLOPs of feature extraction part
Name | Related configurations | GPU | NVIDIA quadro GV100 | CPU | sInter Xeon silver 4210/128G | GPU memory size | 32G | Operating systems | Win10 | Computing platform | CUDA11.0 | CPU(test) | Inter Core i7 10700/16G |
|
Table 2. Configuration of experimental platform
Size of extension box | Number of datasets | Number of images | 5 pixel×5 pixel | 13 | 12484 | 7 pixel×7 pixel | 2 | 798 |
|
Table 3. Statistics of extension box
Parameter | Infrared dim and small targets datasets | Thermal Pedestrian Database | FLIR Thermal Datasets | Class number | 1 | 1 | 3 | Epoch | 500 | 500 | 500 | Batch size | 64 | 4 | 64 | Image size | 384×384 | 320×320 | 512×512 | Batch size(test) | 1 | 1 | 1 |
|
Table 4. Setting of experimental parameters
Method | Precision rate | Recall rate | AP@0.5 | Parameter | Model size/MB | GFLOPs | Detection time/ms | YOLOv3-384 | 0.7371 | 0.8182 | 0.8123 | 61.6×106 | 117.7 | 155.2 | 364.8 | SSD300 | 0.3664 | 0.7585 | 0.5170 | 23.7×106 | 90.6 | 35.2 | 370.4 | Mobilenet-SSD | 0.5241 | 0.5111 | 0.3300 | 6.3×106 | 24.0 | 1.14 | 66.8 | Efficientdet b0 | 0.5948 | 0.0589 | 0.0999 | 3.9×106 | 15.1 | 2.5 | 73.8 | Centernet-ResNet50 | 0.8323 | 0.6156 | 0.6843 | 32.6×106 | 124.8 | 3.8 | 30.3 | YOLOv5s-384 | 0.7310 | 0.8029 | 0.7957 | 7.3×106 | 16.6 | 17.0 | 98.5 | YOLOv4-tiny--384 | 0.6713 | 0.7847 | 0.8195 | 6.2×106 | 12.6 | 16.5 | 80.1 | YOLOv3-tiny-384 | 0.6780 | 0.7652 | 0.8050 | 8.9×106 | 14.2 | 12.9 | 78.5 | YOLO-IDSTD | 0.6405 | 0.8409 | 0.8242 | 3.7×106 | 7.3 | 3.0 | 50.2 |
|
Table 5. Precision and efficiency of different detection methods
Improve the detection speed | Improve the detection accuracy | Recall rate | AP@0.5 | Model
size/MB
| Detection
time/ms
| YOLOv3-tiny baseline | With Focus | With PDSCP | With PANet | With Four-scales prediction | With RFB-Small | √ | | | | | | 0.7879 | 0.7771 | 16.59 | 78.5 | √ | √ | | | | | 0.7576 | 0.7235 | 16.59 | 35.4 | √ | √ | √ | | | | 0.7652 | 0.7342 | 3.65 | 26.5 | √ | √ | √ | √ | | | 0.7652 | 0.7604 | 9.11 | 31.7 | √ | √ | √ | √ | √ | | 0.8258 | 0.8037 | 9.22 | 36.9 | √ | √ | √ | √ | √ | √ | 0.8409 | 0.8242 | 7.27 | 50.2 |
|
Table 6. Ablation experiment of YOLO-IDSTD
Method | OSU Thermal Pedestrian Database | | FLIR Thermal Datasets | Recall rate | AP@0.5 | Detection time/
ms
| Recall rate | mAP@0.5 | AP@0.5
(person)
| AP@0.5
(bicycle)
| AP@0.5
(car)
| Detection time/
ms
| Efficientdet b0 | 0.8723 | 0.8651 | 90.5 | | 0.3374 | 0.4943 | 0.444 | 0.435 | 0.604 | 160.8 | YOLOv5s | 0.9909 | 0.9860 | 69.6 | 0.7706 | 0.7441 | 0.799 | 0.563 | 0.870 | 122.6 | YOLOv3-tiny | 1 | 0.9875 | 53.2 | 0.6906 | 0.6334 | 0.641 | 0.449 | 0.810 | 98.4 | YOLO-IDSTD | 1 | 0.9899 | 42.9 | 0.7166 | 0.6676 | 0.724 | 0.448 | 0.831 | 60.7 |
|
Table 7. Comparative experiments on two sets of infrared small target datasets