Lightweight Swin Transformer combined with multi-scale feature fusion for face expression recognition

Yanqiu Li; Shengzhao Li; Guangling Sun; Pu Yan

doi:10.12086/oee.2025.240234

Journals >Opto-Electronic Engineering >Volume 52 >Issue 1 >Page 240234 > Article

Opto-Electronic Engineering
Vol. 52, Issue 1, 240234 (2025)

Lightweight Swin Transformer combined with multi-scale feature fusion for face expression recognition

Yanqiu Li^1,2, Shengzhao Li¹, Guangling Sun^1,2,*, and Pu Yan^1,2

Author Affiliations

¹School of Electronic and Information Engineering, Anhui Jianzhu University, Hefei, Anhui 260601, China

²Anhui International Joint Research Center for Ancient Architecture Intellisencing and Multi-Dimensional Modeling, Hefei, Anhui 230601, China

show less

DOI: 10.12086/oee.2025.240234 Cite this Article

Yanqiu Li, Shengzhao Li, Guangling Sun, Pu Yan. Lightweight Swin Transformer combined with multi-scale feature fusion for face expression recognition[J]. Opto-Electronic Engineering, 2025, 52(1): 240234 Copy Citation Text

show less

Fig. 1. Swin Transformer network structure diagram

Download full size | View in the Article

Fig. 2. Swin Transformer block module structure diagram

Download full size | View in the Article

Fig. 3. Self-attention computing area. (a) MSA; (b) W-MSA; (c) SW-MSA

Download full size | View in the Article

Fig. 4. Improved model structure diagram

Download full size | View in the Article

Fig. 5. SPST module structure diagram

Download full size | View in the Article

Fig. 6. A visual view of the BN, LN, and BCN standardization technology

Download full size | View in the Article

Fig. 7. EMA module structure diagram

Download full size | View in the Article

Fig. 8. Activation maps of the model before and after adding EMA module

Download full size | View in the Article

Fig. 9. A partial sample of datasets

Download full size | View in the Article

Fig. 10. Confusion matrix validation results on JAFFE. (a) Original Swin Transformer model; (b) Improved Swin Transformer model

Download full size | View in the Article

Fig. 11. Confusion matrix validation results on RAF-DB. (a) Original Swin Transformer model; (b) Improved Swin Transformer model

Download full size | View in the Article

Fig. 12. Confusion matrix validation results on FERPLUS. (a) Original Swin Transformer model; (b) Improved Swin Transformer model

Download full size | View in the Article

Fig. 13. Confusion matrix validation results on FANE. (a) Original Swin Transformer model; (b) Improved Swin Transformer model

Download full size | View in the Article

Model	EMA module	SPST module	Parameters
Original Swin Transformer	×	×	27,524,737
Improved Swin Transformer	√	×	27,526,225
Improved Swin Transformer	×	√	23,185,251
Improved Swin Transformer	√	√	23,186,739

Table 1. Comparison of parameters before and after the model is improved

View in the Article

Position	Swin Transformer block	SPST block	Parameters	R_ACC/%	GFLOPs/G	FPS
Stage1		√	34,331,981	72.33	19.06	86
Stage2		√	29,625,428	75.27	12.44	152
Stage3		√	24,190,413	82.17	5.84	281
Stage4		√	23,185,251	86.86	4.12	335
Stage4	√		27,524,737	85.69	4.51	301

Table 2. Experimental comparison of replacing SPST modules in different stages

View in the Article

Model	Anger	Disgust	Fear	Happy	Sad	Surprise
Original Swin Transformer	10.5974	10.5325	10.4282	10.6150	10.5980	10.6626
Improved Swin Transformer	8.2437	9.4190	9.2204	8.1102	8.9906	8.9113

Table 3. Entropy comparison of activation maps

View in the Article

Configuration name	Environmental parameter
CPU	Inter (R) Core (TM) i5-12400F 2.50 GHz
GPU	NVIDIA GeForce RTX 3060 (12 GB)
Memory	16 G
Python	3.9.19
CUDA	11.8
Torch	2.0.0

Table 4. Configuration of the experimental environment

View in the Article

Position	R_ACC/%				Parameters
Position	JAFFE	FERPLUS	RAF-DB	FANE	Parameters
After stage1	95.57	85.53	86.80	68.84	23,185,635
After stage2	97.56	86.46	87.29	70.11	23,186,739
After stage3	96.80	85.56	86.99	68.60	23,191,107
After stage4	95.87	85.76	86.67	69.37	23,187,875

Table 5. Accuracy of embedding the EMA module behind different stages

View in the Article

SPST module	EMA module	R_ACC/%			Parameters	GFLOPs/G	FPS
SPST module	EMA module	FERPLUS	RAF-DB	FANE	Parameters	GFLOPs/G	FPS
×	×	85.43	85.69	68.47	27,524,737	4.51	301
×	√	85.73	86.99	69.67	27,526,225	4.52	297
√	×	85.87	86.86	69.72	23,185,251	4.12	335
√	√	86.46	87.29	70.11	23,186,739	4.13	330

Table 6. Results of ablation experiments on FERPLUS, RAF-DB, and FANE

View in the Article

Model	ACC/%
Model	JAFFE	FERPLUS	RAF-DB
ARBEx^[9]	96.67	——	——
LBP+HOG^[7]	96.05	——	——
SCN^[4]	86.33	85.97	87.03
RAN^[8]	88.67	83.63	86.90
EfficientNetB0^[25]	——	85.01	84.21
MobileNetV2^[26]	——	84.03	83.54
MobileNetV3^[27]	——	84.97	84.88
Ad-Corre^[28]	——	——	86.96
POSTER^[19]	——	——	86.03
R3HO-Net^[29]	——	——	85.52
Ada-CM^[30]	——	——	84.13
Swin Transformer (base)	95.12	85.43	85.69
Ours	97.56	86.46	87.29

Table 7. Accuracy comparsion of different networks on JAFFE，FERPLUS, and RAF-DB

Download Citation

Save the article for my favorites

Paper Information

微信扫一扫：分享

微信扫一扫：分享