Cross-scale and cross-dimensional adaptive transformer network for colorectal polyp segmentation

Liming LIANG; Anjun HE; Renjie LI; Jian WU

doi:10.37188/OPE.20233118.2700

Journals >Optics and Precision Engineering >Volume 31 >Issue 18 >Page 2700 > Article

Optics and Precision Engineering
Vol. 31, Issue 18, 2700 (2023)

Cross-scale and cross-dimensional adaptive transformer network for colorectal polyp segmentation

Liming LIANG, Anjun HE, Renjie LI, and Jian WU^*

Author Affiliations

School of Electrical Engineering and Automation，Jiangxi University of Science and Technology， Ganzhou341000，China

show less

DOI: 10.37188/OPE.20233118.2700 Cite this Article

Liming LIANG, Anjun HE, Renjie LI, Jian WU. Cross-scale and cross-dimensional adaptive transformer network for colorectal polyp segmentation[J]. Optics and Precision Engineering, 2023, 31(18): 2700 Copy Citation Text

show less

Fig. 1. Core Transformer encoding block

Download full size | View in the Article

Fig. 2. Cross-scale and cross-dimensional adaptive transformer network

Download full size | View in the Article

Fig. 3. Spatial attention bridge block

Download full size | View in the Article

Fig. 4. Channel attention bridge block

Download full size | View in the Article

Fig. 5. Multi-scale dense parallel decoding block

Download full size | View in the Article

Fig. 6. Multi-scale prediction block

Download full size | View in the Article

Fig. 7. Segmentation results of different networks on Kvasir and CVC-ClinicDB datasets

Download full size | View in the Article

Fig. 8. Segmentation results of different networks on CVC-ColonDB and ETIS datasets

Download full size | View in the Article

Algorithm 1： Spatial attention bridge block
Inputs： The input maps of the four channel attention bridge block $C_{i}$ ，i=1，2，3，4
Outputs： $S_{i}$ ，i=1，2，3，4
1： $χ_{m e a n}^{i}$ =AvgPool（ $C_{i}$ ） /avg-pooling/
2： $χ_{m a x}^{i}$ =MaxPool（ $C_{i}$ ）/max-pooling/
3： $χ_{s}^{i}$ =Concat（ $h_{m e a n}^{i}$ _， $h_{m a x}^{i}$ ）/Concatenate the feature map odd/
4： $α$ = $C o n v_{7 \times 7} (h_{c})$ /7×7 convolution operation/
5： $ε$ = $σ (β)$ /After sigmoid， the feature map become $C \times H \times 1$ /
6： $S_{i}$ = $ε$ * $C_{i}$ + $C_{i}$ /The feature map of sigmoid with the original feature and then add /
End

Table 1. [in Chinese]

View in the Article

Dataset	Method	Dice	MIoU	SE	PC	F2	MAE
Kvasir	U-Net	0.818	0.746	0.856	0.857	0.827	0.055
	EUNet	0.908	0.854	0.934	0.911	0.919	0.028
	PraNet	0.898	0.840	0.911	0.916	0.901	0.032
	CaraNet	0.918	0.867	0.912	0.938	0.914	0.023
	PolypPVT	0.917	0.864	0.913	0.947	0.914	0.023
	SSFormer-L	0.918	0.865	0.897	0.957	0.904	0.022
	MSRAFormer	0.923	0.873	0.915	0.952	0.917	0.024
	Ours	0.932	0.883	0.933	0.944	0.931	0.021
CVC-ClinicDB	U-Net	0.823	0.755	0.834	0.839	0.827	0.019
	EUNet	0.902	0.846	0.959	0.880	0.926	0.011
	PraNet	0.899	0.849	0.910	0.907	0.905	0.009
	CaraNet	0.936	0.887	0.955	0.928	0.948	0.007
	PolypPVT	0.937	0.889	0.949	0.936	0.945	0.006
	SSFormer-L	0.906	0.855	0.897	0.931	0.898	0.008
	MSRAFormer	0.924	0.874	0.945	0.920	0.932	0.008
	Ours	0.942	0.896	0.964	0.927	0.954	0.006

Table 1. Segmentation results of different networks on Kvasir and CVC-ClinicDB datasets

View in the Article

Dataset	Method	Dice	MIoU	SE	PC	F2	MAE
CVC-ColonDB	U-Net	0.512	0.444	0.523	0.621	0.510	0.061
	EUNet	0.756	0.681	0.849	0.758	0.788	0.044
	PraNet	0.712	0.640	0.739	0.755	0.717	0.043
	CaraNet	0.773	0.689	0.857	0.753	0.796	0.042
	PolypPVT	0.808	0.727	0.821	0.849	0.809	0.031
	SSFormer-L	0.802	0.721	0.791	0.864	0.787	0.031
	MSRAFormer	0.782	0.707	0.803	0.874	0.787	0.028
	Ours	0.811	0.731	0.823	0.844	0.813	0.027
ETIS	U-Net	0.398	0.335	0.482	0.439	0.429	0.036
	EUNet	0.687	0.609	0.871	0.635	0.749	0.066
	PraNet	0.628	0.567	0.686	0.628	0.649	0.031
	CaraNet	0.747	0.672	0.811	0.731	0.777	0.017
	PolypPVT	0.787	0.706	0.867	0.774	0.820	0.013
	SSFormer-L	0.796	0.720	0.830	0.794	0.807	0.014
	MSRAFormer	0.750	0.679	0.811	0.745	0.777	0.013
	Ours	0.805	0.729	0.887	0.770	0.842	0.012

Table 2. Segmentation results of different networks on CVC-ColonDB and ETIS datasets

View in the Article

Algorithm 2： Channel attention bridge block
Inputs： The input maps of the four stages $E_{i}$ ，i=1，2，3，4
Outputs： $C_{i}$ ，i=1，2，3，4
1： $h_{m e a n}^{i}$ =AvgPool（ $C_{i}$ ） /avg-pooling/
2： $h_{c}$ =Concat（ $h_{m e a n}^{1}$ _， $h_{m e a n}^{2}$ _， $h_{m e a n}^{3}$ _， $h_{m e a n}^{4}$ ）/Concatenate the feature map of avg-pooling/
3： $β$ = $C o n v_{3 \times 3} (h_{c})$ /3×3 convolution operation/
4： $γ$ = $σ (β)$ /After sigmoid， the feature map become $C \times H \times 1$ /
5： $C_{i}$ = $γ$ * $E_{i}$ + $E_{i}$ /The feature map of sigmoid with the original feature and then add /
End

Table 2. [in Chinese]

View in the Article

Method	Parameters/M	GFLOPs	Train/（round·s^-1）
U-Net	34.53	65.52	309
EU-Net	31.36	12.31	284
PraNet	30.50	6.96	90
CaraNet	44.54	11.45	256
Polyp-PVT	25.12	5.30	233
SSFormer-L	65.96	17.29	220
MSRAformer	68.03	21.29	199
Ours	24.99	10.01	127

Table 3. Performance comparison of different networks（CVC-ClinicDB）

View in the Article

Dataset	Method	Dice	MIoU	SE	PC	F2
Kvasir	M1	0.906	0.851	0.900	0.931	0.901
	M2	0.921	0.871	0.930	0.931	0.926
	M3	0.928	0.877	0.934	0.936	0.928
	M4	0.932	0.883	0.933	0.944	0.931
CVC-ColonDB	M1	0.786	0.705	0.7918	0.835	0.785
	M2	0.789	0.706	0.8337	0.803	0.802
	M3	0.810	0.730	0.841	0.797	0.806
	M4	0.811	0.731	0.823	0.844	0.813

Table 4. Ablation results of each module on the Kvasir and CVC-ColonDB datasets

Liming LIANG, Anjun HE, Renjie LI, Jian WU. Cross-scale and cross-dimensional adaptive transformer network for colorectal polyp segmentation[J]. Optics and Precision Engineering, 2023, 31(18): 2700

Download Citation

Tools

Save the article for my favorites

Paper Information

微信扫一扫：分享

微信扫一扫：分享