• Opto-Electronic Engineering
  • Vol. 52, Issue 1, 240234 (2025)
Yanqiu Li1,2, Shengzhao Li1, Guangling Sun1,2,*, and Pu Yan1,2
Author Affiliations
  • 1School of Electronic and Information Engineering, Anhui Jianzhu University, Hefei, Anhui 260601, China
  • 2Anhui International Joint Research Center for Ancient Architecture Intellisencing and Multi-Dimensional Modeling, Hefei, Anhui 230601, China
  • show less
    DOI: 10.12086/oee.2025.240234 Cite this Article
    Yanqiu Li, Shengzhao Li, Guangling Sun, Pu Yan. Lightweight Swin Transformer combined with multi-scale feature fusion for face expression recognition[J]. Opto-Electronic Engineering, 2025, 52(1): 240234 Copy Citation Text show less
    Swin Transformer network structure diagram
    Fig. 1. Swin Transformer network structure diagram
    Swin Transformer block module structure diagram
    Fig. 2. Swin Transformer block module structure diagram
    Self-attention computing area. (a) MSA; (b) W-MSA; (c) SW-MSA
    Fig. 3. Self-attention computing area. (a) MSA; (b) W-MSA; (c) SW-MSA
    Improved model structure diagram
    Fig. 4. Improved model structure diagram
    SPST module structure diagram
    Fig. 5. SPST module structure diagram
    A visual view of the BN, LN, and BCN standardization technology
    Fig. 6. A visual view of the BN, LN, and BCN standardization technology
    EMA module structure diagram
    Fig. 7. EMA module structure diagram
    Activation maps of the model before and after adding EMA module
    Fig. 8. Activation maps of the model before and after adding EMA module
    A partial sample of datasets
    Fig. 9. A partial sample of datasets
    Confusion matrix validation results on JAFFE. (a) Original Swin Transformer model; (b) Improved Swin Transformer model
    Fig. 10. Confusion matrix validation results on JAFFE. (a) Original Swin Transformer model; (b) Improved Swin Transformer model
    Confusion matrix validation results on RAF-DB. (a) Original Swin Transformer model; (b) Improved Swin Transformer model
    Fig. 11. Confusion matrix validation results on RAF-DB. (a) Original Swin Transformer model; (b) Improved Swin Transformer model
    Confusion matrix validation results on FERPLUS. (a) Original Swin Transformer model; (b) Improved Swin Transformer model
    Fig. 12. Confusion matrix validation results on FERPLUS. (a) Original Swin Transformer model; (b) Improved Swin Transformer model
    Confusion matrix validation results on FANE. (a) Original Swin Transformer model; (b) Improved Swin Transformer model
    Fig. 13. Confusion matrix validation results on FANE. (a) Original Swin Transformer model; (b) Improved Swin Transformer model
    ModelEMA moduleSPST moduleParameters
    Original Swin Transformer××27,524,737
    Improved Swin Transformer×27,526,225
    Improved Swin Transformer×23,185,251
    Improved Swin Transformer23,186,739
    Table 1. Comparison of parameters before and after the model is improved
    PositionSwin Transformer blockSPST blockParametersRACC/%GFLOPs/GFPS
    Stage134,331,98172.3319.0686
    Stage229,625,42875.2712.44152
    Stage324,190,41382.175.84281
    Stage423,185,25186.864.12335
    Stage427,524,73785.694.51301
    Table 2. Experimental comparison of replacing SPST modules in different stages
    ModelAngerDisgustFearHappySadSurprise
    Original Swin Transformer10.597410.532510.428210.615010.598010.6626
    Improved Swin Transformer8.24379.41909.22048.11028.99068.9113
    Table 3. Entropy comparison of activation maps
    Configuration nameEnvironmental parameter
    CPUInter (R) Core (TM) i5-12400F 2.50 GHz
    GPUNVIDIA GeForce RTX 3060 (12 GB)
    Memory16 G
    Python3.9.19
    CUDA11.8
    Torch2.0.0
    Table 4. Configuration of the experimental environment
    PositionRACC/%Parameters
    JAFFEFERPLUSRAF-DBFANE
    After stage195.5785.5386.8068.8423,185,635
    After stage297.5686.4687.2970.1123,186,739
    After stage396.8085.5686.9968.6023,191,107
    After stage495.8785.7686.6769.3723,187,875
    Table 5. Accuracy of embedding the EMA module behind different stages
    SPST moduleEMA moduleRACC/%ParametersGFLOPs/GFPS
    FERPLUSRAF-DBFANE
    ××85.4385.6968.4727,524,7374.51301
    ×85.7386.9969.6727,526,2254.52297
    ×85.8786.8669.7223,185,2514.12335
    86.4687.2970.1123,186,7394.13330
    Table 6. Results of ablation experiments on FERPLUS, RAF-DB, and FANE
    ModelACC/%
    JAFFEFERPLUSRAF-DB
    ARBEx[9]96.67————
    LBP+HOG[7]96.05————
    SCN[4]86.3385.9787.03
    RAN[8]88.6783.6386.90
    EfficientNetB0[25]——85.0184.21
    MobileNetV2[26]——84.0383.54
    MobileNetV3[27]——84.9784.88
    Ad-Corre[28]————86.96
    POSTER[19]————86.03
    R3HO-Net[29]————85.52
    Ada-CM[30]————84.13
    Swin Transformer (base)95.1285.4385.69
    Ours97.5686.4687.29
    Table 7. Accuracy comparsion of different networks on JAFFE,FERPLUS, and RAF-DB
    Yanqiu Li, Shengzhao Li, Guangling Sun, Pu Yan. Lightweight Swin Transformer combined with multi-scale feature fusion for face expression recognition[J]. Opto-Electronic Engineering, 2025, 52(1): 240234
    Download Citation