Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features

Fuzheng Guo; Jun Kong; Min Jiang

doi:10.3788/LOP57.201506

Journals >Laser & Optoelectronics Progress >Volume 57 >Issue 20 >Page 201506 > Article

Laser & Optoelectronics Progress
Vol. 57, Issue 20, 201506 (2020)

Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features

Fuzheng Guo, Jun Kong^*, and Min Jiang

Author Affiliations

International Joint Laboratory for Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, Jiangsu 214122, China

show less

DOI: 10.3788/LOP57.201506 Cite this Article Set citation alerts

Fuzheng Guo, Jun Kong, Min Jiang. Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features[J]. Laser & Optoelectronics Progress, 2020, 57(20): 201506 Copy Citation Text

show less

Fig. 1. RGB images and corresponding skeleton images

Download full size

Fig. 2. Overall network

Download full size

Fig. 3. Spatial-temporal feature extracting network with self-attention

Download full size

Fig. 4. Adaptive weight computing network

Download full size

Fig. 5. Feature fusion and classification

Download full size

Fig. 6. Accuracy of different weight combinations

Download full size

Fig. 7. Recognition results of using skeleton features only and fusion features

Download full size

Fig. 8. Visualization of self-attention on skeleton and RGB images of Golf

Download full size

Fig. 9. Visualization of self-attention on skeleton and RGB images of Baseball swing

Download full size

Fig. 10. Visualization of adaptive weight of Golf, Baseball swing, Walk and Run

Download full size

Parameter	Value
Loss function	Categorical cross entropy
Optimizer	Adam
Learning rate	0.0001
Batch_size	32
Number of epoch	150

Table 1. Experimental parameters

Attention	RGB	Skeleton	Fusion
Without attention	90.3	83.8	92.8
With attention	92.1	85.2	94.3

Table 2. Accuracy with and without self-attention on Penn Action dataset unit: %

Attention	RGB	Skeleton	Fusion
Without attention	69.2	61.9	72.9
With attention	71.3	63.7	74.8

Table 3. Accuracy with and without self-attention on JHMDB dataset unit: %

Algorithm	Accuracy
AOG-Fine^[16]	73.4
STIP-HoG+HoG^[17]	82.8
AOG-All^[16]	85.5
C3D^[18]	86.0
JDD^[19]	87.4
MMTSN-RGB+Pose^[20]	91.67
IDT-FV^[19]	92.0
IDT-FV+Pose^[19]	92.9
TSN ^[21]	93.8
DPI+att-DTI^[22]	93.9
DPI+att-DTIs^[22]	95.8
AWCN (Ours)	92.8
AWCN+self-attention (Ours)	94.3

Table 4. Comparison of AWCN and other algorithms on Penn Action dataset unit: %

Algorithm	Accuracy
P-CNN^[7]	61.1
FAT^[23]	62.5
MMTSN-RGB+Pose^[20]	62.86
STAR-Net^[24]	64.3
IDT-FV^[19]	65.9
TS R-CNN^[23]	70.5
MR-TS R-CNN^[23]	71.1
GoogLeNet+iTF^[25]	74.5
AWCN (Ours)	72.9
AWCN+self-attention (Ours)	74.8

Table 5. Comparison of AWCN and other algorithms on JHMDB dataset unit: %

Algorithm	CS	CV
STA-LSTM^[26]	73.4	81.2
VA-LSTM^[27]	79.4	87.6
ST-GCN^[28]	81.5	88.3
Two-Stream CNN^[29]	83.2	89.3
CSTA-CNN^[30]	84.9	89.9
HCN^[31]	86.5	91.9
SR-TSL^[32]	84.8	92.4
AWCN (Ours)	85.6	88.9
AWCN+self-attention (Ours)	87.3	90.1

Table 6. Comparison of AWCN and other algorithms on NTU RGB-D dataset unit: %

Fuzheng Guo, Jun Kong, Min Jiang. Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features[J]. Laser & Optoelectronics Progress, 2020, 57(20): 201506

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information