Fig. 1. Two types of convolution filters. (a) Standard 3D convolution filters; (b) depthwise separable convolution filters
Fig. 2. Three types of residual modules. (a) Non-bottleneck residual module; (b) bottleneck residual module; (c) depthwise separable residual module
Fig. 3. Parallel down-sampling block
Fig. 4. Dilated convolution. (a) Standard convolution filters; (b) 2-dilated convolution filters; (c) separable residual module combined with dilated convolution
Fig. 5. Network architecture
Fig. 6. Separable residual module combined with channel reduction. (a) 1/2 channels; (b) 1/4 channels
Fig. 7. Qualitative comparison between SRNet and ENet. (a) Input image; (b) ground truth; (c) ENet result; (d) SRNet result
Residual block | Bt /k | Non-Bt /k | DS-Bt /k | DS-Non-Bt /k |
---|
In_Out_C | 64 | 4.35 | 36.86 | 2.77 | 4.67 | 256 | 69.63 | 589.82 | 35.65 | 67.84 |
|
Table 1. Weight sizes of different residual blocks
Network | Block | Type | In-Res /(pixel×pixel) | In-C | Out-Res /(pixel×pixel) | Out-C |
---|
Encoder | 1 | Down-sampling | 1024×512 | 3 | 512×256 | 16 | 2 | Down-sampling | 512×256 | 16 | 256×128 | 64 | 3-7 | 5×DS-Non-Bt | 256×128 | 64 | 256×128 | 64 | 8 | Down-sampling | 256×128 | 64 | 128×64 | 128 | 9-16 | 2×DS-Non-Bt(rate=2,4,8,16) | 128×64 | 128 | 128×64 | 128 | Decoder | 17 | Deconvolution | 128×64 | 128 | 128×64 | 64 | 18-19 | 2×DS-Non-Bt | 256×128 | 64 | 256×128 | 64 | 20 | Deconvolution | 256×128 | 64 | 512×256 | 16 | 21-22 | 2×DS-Non-Bt | 512×256 | 16 | 512×256 | 16 | 23 | Deconvolution | 512×256 | 16 | 1024×512 | C |
|
Table 2. Detailed descriptions of our network
Module | Miou /% | Parameter /106 | Time /ms |
---|
Bt | 57.12 | 0.31 | 18 | Non-Bt | 62.19 | 3.03 | 35 | DS-Bt | 54.36 | 0.22 | 15 | DS-Non-Bt | 61.37 | 0.49 | 24 |
|
Table 3. Accuracy and efficiency of each residual module
Module | Channel | Miou /% | Parameter /106 | Time /ms |
---|
Bt | n | 57.12 | 0.31 | 18 | Non-Bt | n/4 | 52.38 | 0.20 | 13 | DS-Non-Bt | n/4 | 53.23 | 0.05 | 11 | Bt | 4n | 60.81 | 4.71 | 45 | Non-Bt | n | 62.19 | 3.03 | 35 | DS-Non-Bt | n | 61.37 | 0.49 | 24 |
|
Table 4. Accuracy and efficiency of each residual module with different channels
Module | Miou /% | Parameter /k | Time /ms |
---|
DW-Non-Bt | 67.82 | 491 | 88 | DW-Bt-1/2 | 64.47 | 321 | 70 | DW-Bt-1/4 | 60.89 | 236 | 62 |
|
Table 5. Separable residual module combined with channel reduction
Model | Class | Roa | Sid | Bui | Wal | Fen | Pol | TLi | TSi | Veg | Ter | Sky | Per | Rid | Car | Tru | Bus | Tra | Mot | Bic |
---|
SegNet | 56.95 | 96.4 | 73.2 | 84.0 | 28.4 | 29.0 | 35.7 | 39.8 | 45.1 | 87.0 | 63.8 | 91.8 | 62.8 | 42.8 | 89.3 | 38.1 | 43.1 | 44.1 | 35.8 | 51.9 | SQ | 59.84 | 96.9 | 75.4 | 87.8 | 31.6 | 35.7 | 50.9 | 52.0 | 61.7 | 90.9 | 65.8 | 93.0 | 73.8 | 42.6 | 91.5 | 18.8 | 41.2 | 33.3 | 34.0 | 59.9 | ENet | 58.28 | 96.3 | 74.2 | 75.0 | 32.2 | 33.2 | 43.4 | 34.1 | 44.0 | 88.6 | 61.4 | 90.6 | 65.5 | 38.4 | 90.6 | 36.9 | 50.5 | 48.1 | 38.8 | 55.4 | SRNet | 67.86 | 97.1 | 78.6 | 89.6 | 49.3 | 51.2 | 56.9 | 57.5 | 66.3 | 90.4 | 57.0 | 92.2 | 71.8 | 48.6 | 91.7 | 55.7 | 70.2 | 58.3 | 40.3 | 66.0 |
|
Table 6. Separation accuracy of each network%
Model | 2048×1024 | 1024×512 | 512×256 | 1920×1080 | 1280×720 | 640×360 |
---|
Time /ms | Framerate /(frame·s-1) | Time /ms | Framerate /(frame·s-1) | Time /ms | Framerate /(frame·s-1) | Time /ms | Framerate /(frame·s-1) | Time /ms | Framerate /(frame·s-1) | Time /ms | Framerate /(frame·s-1) |
---|
SegNet | 641 | 2 | 169 | 6 | 41 | 24 | 637 | 1 | 289 | 3 | 69 | 14 | SQ | 59 | 17 | 19 | 53 | 6 | 167 | 58 | 17 | 33 | 30 | 9 | 111 | ENet | 49 | 20 | 13 | 77 | 7 | 143 | 46 | 21 | 21 | 46 | 7 | 135 | SRNet | 88 | 12 | 24 | 42 | 6 | 167 | 88 | 12 | 37 | 27 | 9 | 111 |
|
Table 7. Separation efficiency of each network