• Optics and Precision Engineering
  • Vol. 31, Issue 18, 2700 (2023)
Liming LIANG, Anjun HE, Renjie LI, and Jian WU*
Author Affiliations
  • School of Electrical Engineering and Automation,Jiangxi University of Science and Technology, Ganzhou341000,China
  • show less
    DOI: 10.37188/OPE.20233118.2700 Cite this Article
    Liming LIANG, Anjun HE, Renjie LI, Jian WU. Cross-scale and cross-dimensional adaptive transformer network for colorectal polyp segmentation[J]. Optics and Precision Engineering, 2023, 31(18): 2700 Copy Citation Text show less
    Core Transformer encoding block
    Fig. 1. Core Transformer encoding block
    Cross-scale and cross-dimensional adaptive transformer network
    Fig. 2. Cross-scale and cross-dimensional adaptive transformer network
    Spatial attention bridge block
    Fig. 3. Spatial attention bridge block
    Channel attention bridge block
    Fig. 4. Channel attention bridge block
    Multi-scale dense parallel decoding block
    Fig. 5. Multi-scale dense parallel decoding block
    Multi-scale prediction block
    Fig. 6. Multi-scale prediction block
    Segmentation results of different networks on Kvasir and CVC-ClinicDB datasets
    Fig. 7. Segmentation results of different networks on Kvasir and CVC-ClinicDB datasets
    Segmentation results of different networks on CVC-ColonDB and ETIS datasets
    Fig. 8. Segmentation results of different networks on CVC-ColonDB and ETIS datasets

    Algorithm 1: Spatial attention bridge block

    Inputs: The input maps of the four channel attention bridge block Cii=1,2,3,4

    Outputs: Sii=1,2,3,4

     1: χmeani=AvgPool(Ci) /*avg-pooling*/

     2: χmaxi=MaxPool(Ci)/*max-pooling*/

     3: χsi=Concat(hmeanihmaxi)/*Concatenate the feature map odd*/

     4: α=Conv7×7(hc)/*7×7 convolution operation*/

     5: ε=σ(β)/*After sigmoid, the feature map becomeC×H×1*/

     6: Si=ε*Ci+Ci/*The feature map of sigmoid with the original feature and then add */

    End

    Table 1. [in Chinese]
    DatasetMethodDiceMIoUSEPCF2MAE
    KvasirU-Net0.8180.7460.8560.8570.8270.055
    EUNet0.9080.8540.9340.9110.9190.028
    PraNet0.8980.8400.9110.9160.9010.032
    CaraNet0.9180.8670.9120.9380.9140.023
    PolypPVT0.9170.8640.9130.9470.9140.023
    SSFormer-L0.9180.8650.8970.9570.9040.022
    MSRAFormer0.9230.8730.9150.9520.9170.024
    Ours0.9320.8830.9330.9440.9310.021
    CVC-ClinicDBU-Net0.8230.7550.8340.8390.8270.019
    EUNet0.9020.8460.9590.8800.9260.011
    PraNet0.8990.8490.9100.9070.9050.009
    CaraNet0.9360.8870.9550.9280.9480.007
    PolypPVT0.9370.8890.9490.9360.9450.006
    SSFormer-L0.9060.8550.8970.9310.8980.008
    MSRAFormer0.9240.8740.9450.9200.9320.008
    Ours0.9420.8960.9640.9270.9540.006
    Table 1. Segmentation results of different networks on Kvasir and CVC-ClinicDB datasets
    DatasetMethodDiceMIoUSEPCF2MAE
    CVC-ColonDBU-Net0.5120.4440.5230.6210.5100.061
    EUNet0.7560.6810.8490.7580.7880.044
    PraNet0.7120.6400.7390.7550.7170.043
    CaraNet0.7730.6890.8570.7530.7960.042
    PolypPVT0.8080.7270.8210.8490.8090.031
    SSFormer-L0.8020.7210.7910.8640.7870.031
    MSRAFormer0.7820.7070.8030.8740.7870.028
    Ours0.8110.7310.8230.8440.8130.027
    ETISU-Net0.3980.3350.4820.4390.4290.036
    EUNet0.6870.6090.8710.6350.7490.066
    PraNet0.6280.5670.6860.6280.6490.031
    CaraNet0.7470.6720.8110.7310.7770.017
    PolypPVT0.7870.7060.8670.7740.8200.013
    SSFormer-L0.7960.7200.8300.7940.8070.014
    MSRAFormer0.7500.6790.8110.7450.7770.013
    Ours0.8050.7290.8870.7700.8420.012
    Table 2. Segmentation results of different networks on CVC-ColonDB and ETIS datasets

    Algorithm 2: Channel attention bridge block

    Inputs: The input maps of the four stagesEii=1,2,3,4

    Outputs: Cii=1,2,3,4

     1: hmeani=AvgPool(Ci) /*avg-pooling*/

     2: hc=Concat(hmean1hmean2hmean3hmean4)/*Concatenate the feature map of avg-pooling*/

     3: β=Conv3×3(hc)/*3×3 convolution operation*/

     4: γ=σ(β)/*After sigmoid, the feature map becomeC×H×1*/

     5: Ci=γ*Ei+Ei/*The feature map of sigmoid with the original feature and then add */

    End

    Table 2. [in Chinese]
    MethodParameters/MGFLOPsTrain/(round·s-1
    U-Net34.5365.52309
    EU-Net31.3612.31284
    PraNet30.506.9690
    CaraNet44.5411.45256
    Polyp-PVT25.125.30233
    SSFormer-L65.9617.29220
    MSRAformer68.0321.29199
    Ours24.9910.01127
    Table 3. Performance comparison of different networks(CVC-ClinicDB)
    DatasetMethodDiceMIoUSEPCF2
    KvasirM10.9060.8510.9000.9310.901
    M20.9210.8710.9300.9310.926
    M30.9280.8770.9340.9360.928
    M40.9320.8830.9330.9440.931
    CVC-ColonDBM10.7860.7050.79180.8350.785
    M20.7890.7060.83370.8030.802
    M30.8100.7300.8410.7970.806
    M40.8110.7310.8230.8440.813
    Table 4. Ablation results of each module on the Kvasir and CVC-ColonDB datasets
    Liming LIANG, Anjun HE, Renjie LI, Jian WU. Cross-scale and cross-dimensional adaptive transformer network for colorectal polyp segmentation[J]. Optics and Precision Engineering, 2023, 31(18): 2700
    Download Citation