Fig. 1. Mobile-Block structure with stride of 1.(a) MobileFaceNet; (b) Dual-MobileFaceNet
Fig. 2. Mobile-Block structure with stride of 2. (a) MobileFaceNet; (b) Dual-MobileFaceNet
Fig. 3. Schematic of Dual-MobileFaceNet structure
Fig. 4. Schematic of double classifier structure
Fig. 5. Examples of self-made training dataset
Fig. 6. Classroom scene. (a) Real scene; (b) sketch map
Fig. 7. Interface connection of Jetson TX2
Fig. 8. Recognition results of proposed algorithm. (a) 8-people video; (b)16-people video
Fig. 9. Recognition accuracy confusion matrix of 8-people video. (a) InsightFace; (b) Double classifier
Fig. 10. Diagram of different face sizes. (a) Big face; (b) medium face; (c) small face
Fig. 11. Recognition accuracy of different networks for different sizes of faces
Fig. 12. Recognition accuracy of different algorithms for different sizes of faces
Input size/Numberof channels | Type | Output size/Numberof channels | Operation | s | n | Pad |
---|
112×112/3 | Convolution | 56×56/64 | 3×3 Conv | 2 | 1 | 1 | 56×56/64 | Convolution | 56×56/64 | 3×3 dw_Conv | 1 | 1 | 1 | 56×56/64 | Dual-Block | 56×56/128+2k | | 1 | 2 | | 56×56/128+2k | Mobile-Block | 28×28/64 | | 2 | 1 | 1 | 28×28/64 | Dual-Block | 28×28/128+6k | | 1 | 6 | | 28×28/128+6k | Mobile-Block | 14×14/128 | | 2 | 1 | 1 | 14×14/128 | Dual-Block | 14×14/256+4k | | 1 | 4 | | 14×14/256+4k | Mobile-Block | 7×7/128 | | 2 | 1 | 1 | 7×7/128 | Dual-Block | 7×7/256+2k | | 1 | 2 | | 7×7/256+2k | Convolution | 7×7/512 | 1×1 pw_Conv | 1 | 1 | 0 | 7×7/512 | Convolution | 1×1/512 | 7×7 Linear_Conv | 1 | 1 | 0 | 1×1/512 | Convolution | 1×1/128 | 1×1 Linear_pw_Conv | 1 | 1 | 0 |
|
Table 1. Network structure of Dual-MobileFaceNet
Network | Recognition accuracy /% | Speed /(frame·s-1) | Model Size /MB |
---|
AgeDB | CFP_FP | CFP_FF | LFW | CALFW |
---|
ResNet-101[13] | 97.28 | 95.11 | 99.65 | 99.71 | 96.65 | 42.64 | 250 | ResNet-50[13] | 96.03 | 94.06 | 99.62 | 99.52 | 95.36 | 70.84 | 174.5 | DenseNet-201(k=32)[12] | 96.68 | 94.83 | 99.62 | 99.68 | 96.04 | 100.17 | 161.8 | DenseNet-169(k=32)[12] | 95.38 | 93.66 | 99.01 | 98.86 | 95.28 | 120.34 | 114.4 | ShuffleNet(1×,g=3)[22] | 89.27 | 89.09 | 97.75 | 98.70 | 93.06 | 410.78 | 7.4 | MobileNet-v1[20] | 88.65 | 88.54 | 97.06 | 98.43 | 93.01 | 206.64 | 13.7 | MobileNet-v2[21] | 88.81 | 88.53 | 97.36 | 98.38 | 92.88 | 230.71 | 8.6 | MobileFaceNet[11] | 92.95 | 89.46 | 98.03 | 98.96 | 93.89 | 432.41 | 4.1 | Dual-MobileFaceNet | 93.94 | 91.16 | 98.68 | 99.18 | 94.02 | 326.35 | 8.8 |
|
Table 2. Comparison of experiment results of different networks
Algorithm | LFW | CFP-FP | AgeDB-30 |
---|
DeepFace[3] | 95.53 | 87.46 | 89.61 | Deep FR[23] | 96.04 | 88.26 | 90.13 | DeepID2[4] | 96.14 | 87.85 | 90.26 | FaceNet[5] | 96.95 | 88.20 | 90.69 | SphereFace[6] | 97.58 | 90.03 | 91.84 | CosFace[7] | 98.43 | 90.75 | 92.33 | InsightFace[8] | 99.18 | 91.16 | 93.94 | O-Double classifier | 99.12 | 91.21 | 93.22 | Double classifier | 99.46 | 93.33 | 95.88 |
|
Table 3. Recognition accuracy comparison of different algorithms%
Network | Recognition accuracy /% | Speed /(frame·s-1) | FLOPS/106 |
---|
8-people | 18-people | 8-people | 18-people |
---|
ResNet-101[13] | 97.08 | 94.14 | 2.16 | 4.37 | 22.69×103 | ResNet-50[13] | 95.96 | 91.51 | 1.28 | 2.61 | 12.34×103 | DenseNet-201[12] | 96.78 | 94.98 | 1.16 | 2.36 | 8.5×103 | DenseNet-169[12] | 95.27 | 91.69 | 0.89 | 1.81 | 6.6×103 | ShuffleNet[22] | 92.05 | 87.53 | 0.12 | 0.26 | 591 | MobileNet-v1[20] | 91.12 | 85.60 | 0.16 | 0.35 | 1.1×103 | MobileNet-v2[21] | 91.96 | 86.33 | 0.13 | 0.28 | 1.0×103 | MobileFaceNet[11] | 92.83 | 88.77 | 0.10 | 0.21 | 439.8 | Dual-MobileFaceNet | 96.24 | 94.68 | 0.14 | 0.29 | 1.0×103 |
|
Table 4. Experimental results of different networks on pan tilt video
Algorithm | 8-people | 18-people |
---|
DeepFace[3] | 87.53 | 83.67 | Deep FR[23] | 88.54 | 84.27 | DeepID2[4] | 88.94 | 84.25 | FaceNet[5] | 89.35 | 85.33 | SphereFace[6] | 90.58 | 87.68 | CosFace[7] | 91.83 | 90.75 | InsightFace[8] | 93.69 | 91.68 | Double classifier | 96.24 | 94.68 |
|
Table 5. Recognition accuracy of different algorithms%