Indoor RGB-D Image Semantic Segmentation Based on Dual-Stream Weighted Gabor Convolutional Network Fusion

Xuchu Wang; Huihuang Liu; Yanmin Niu

doi:10.3788/AOS202040.1910001

Journals >Acta Optica Sinica >Volume 40 >Issue 19 >Page 1910001 > Article

Acta Optica Sinica
Vol. 40, Issue 19, 1910001 (2020)

Indoor RGB-D Image Semantic Segmentation Based on Dual-Stream Weighted Gabor Convolutional Network Fusion

Xuchu Wang^1、2、*, Huihuang Liu², and Yanmin Niu³

Author Affiliations

¹Key Laboratory of Optoelectronic Technology and Systems of Ministry of Education, Chongqing University, Chongqing 400040, China

²College of Optoelectronic Engineering, Chongqing University, Chongqing 400040, China

³College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China

show less

DOI: 10.3788/AOS202040.1910001 Cite this Article Set citation alerts

Xuchu Wang, Huihuang Liu, Yanmin Niu. Indoor RGB-D Image Semantic Segmentation Based on Dual-Stream Weighted Gabor Convolutional Network Fusion[J]. Acta Optica Sinica, 2020, 40(19): 1910001 Copy Citation Text

show less

Fig. 1. RGB-D image semantic segmentation by double-stream weighted Gabor convolution network fusion

Download full size

Fig. 2. Modulation process of WGoFs

Download full size

Fig. 3. Convolution process of WGoFs

Download full size

Fig. 4. Wide residual blocks. (a) Original residual block; (b) wide residual block 1; (c) wide residual block 2

Download full size

Fig. 5. Architecture of WRN-WGCN module

Download full size

Fig. 6. Pyramid pooling module

Download full size

Fig. 7. Proposed pyramid pooling feature fusion module

Download full size

Fig. 8. RGB and depth images and their corresponding semantic labels in dataset. (a) RGB images; (b) depth images; (c) semantic labels

Download full size

Fig. 9. Loss curves in training process

Download full size

Fig. 10. Test accuracy versus number of scales and number of directions. (a) Test accuracy under different number of scales; (b) test accuracy under different number of directions

Download full size

Fig. 11. Semantic segmentation results obtained by various methods on NYUDv2 dataset. (a) RGB; (b) depth; (c) GT; (d) baseline; (e) WRN-CNN; (f) WGCN; (g) PP-Fusion; (h) FCN; (i) SegNet; (j) ours

Download full size

Fig. 12. Semantic segmentation results obtained by various methods on SUN-RGBD dataset. (a) RGB; (b) depth; (c) GT; (d) baseline; (e) WRN-CNN; (f) WGCN; (g) PP-Fusion; (h) FCN; (i) SegNet; (j) ours

Download full size

Group name	Output feature size	Block type
GCConv1	N×N	$[3 \times 38]$
GCConv2	N×N	$[\begin{array}{l} 3 \times 3 & 16 \times k \\ 3 \times 3 & 16 \times k \end{array}]$ ×L
GCConv3	N×N	$[\begin{array}{l} 3 \times 3 & 16 \times k \\ 3 \times 3 & 16 \times k \end{array}]$ ×L
GCConv4	(N/2)×(N/2)	$[\begin{array}{l} 3 \times 3 & 32 \times k \\ 3 \times 3 & 32 \times k \end{array}]$ ×L

Table 1. Structural parameter setting of WRN-WGCN

Model name	Filter size	Model size /MB
Model 1	5×5	163
Model 2	5×5	124
Model 3	3×3	148
Model 4	3×3	117

Table 2. Model sizes with different filter sizes

Method	Module			A_cc /%	m_Acc /%	m_IoU /%	F_WIoU /%
Method	WRN-CNN	WGCN	PP-Fusion	A_cc /%	m_Acc /%	m_IoU /%	F_WIoU /%
Ours	√	√	√	66.3	50.8	40.0	53.1
Variant 1				58.3	41.6	30.1	45.8
Variant 2	√			58.6	42.4	31.9	45.3
Variant 3		√		60.8	48.2	35.8	50.4
Variant 4			√	63.2	45.8	36.4	46.6
FCN^[2]				65.4	45.1	34.3	48.6
SegNet^[3]				56.2	47.6	35.1	50.1

Table 3. Comparison of results for different segmentation algorithms on NYUDv2 dataset

Method	Module			A_cc /%	m_Acc /%	m_IoU /%	F_WIoU /%
Method	WRN-CNN	WGCN	PP-Fusion	A_cc /%	m_Acc /%	m_IoU /%	F_WIoU /%
Ours	√	√	√	58.2	38.5	28.2	42.0
Variant 1				45.2	33.7	21.8	37.4
Variant 2	√			44.8	34.5	23.1	38.6
Variant 3		√		54.6	35.1	27.3	37.7
Variant 4			√	56.1	34.6	26.0	36.3
FCN^[2]				49.5	36.5	23.7	35.8
SegNet^[3]				47.8	34.6	26.2	38.2

Table 4. Comparison of results for different segmentation algorithms on SUN-RGBD dataset

Method	Module			Model size /MB	Reasoning time /ms
Method	WRN-CNN	WGCN	PP-Fusion	Model size /MB	Reasoning time /ms
Ours	√	√	√	117	42
Variant 1				381	76
Variant 2	√			115	35
Variant 3		√		187	48
Variant 4			√	245	51
FCN^[2]				549	43
SegNet^[3]				126	58

Table 5. Comparison of reasoning time and space complexity for different algorithms

Xuchu Wang, Huihuang Liu, Yanmin Niu. Indoor RGB-D Image Semantic Segmentation Based on Dual-Stream Weighted Gabor Convolutional Network Fusion[J]. Acta Optica Sinica, 2020, 40(19): 1910001

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information