Author Affiliations
International Joint Laboratory for Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, Jiangsu 214122, Chinashow less
Fig. 1. RGB images and corresponding skeleton images
Fig. 2. Overall network
Fig. 3. Spatial-temporal feature extracting network with self-attention
Fig. 4. Adaptive weight computing network
Fig. 5. Feature fusion and classification
Fig. 6. Accuracy of different weight combinations
Fig. 7. Recognition results of using skeleton features only and fusion features
Fig. 8. Visualization of self-attention on skeleton and RGB images of Golf
Fig. 9. Visualization of self-attention on skeleton and RGB images of Baseball swing
Fig. 10. Visualization of adaptive weight of Golf, Baseball swing, Walk and Run
Parameter | Value |
---|
Loss function | Categorical cross entropy | Optimizer | Adam | Learning rate | 0.0001 | Batch_size | 32 | Number of epoch | 150 |
|
Table 1. Experimental parameters
Attention | RGB | Skeleton | Fusion |
---|
Without attention | 90.3 | 83.8 | 92.8 | With attention | 92.1 | 85.2 | 94.3 |
|
Table 2. Accuracy with and without self-attention on Penn Action dataset unit: %
Attention | RGB | Skeleton | Fusion |
---|
Without attention | 69.2 | 61.9 | 72.9 | With attention | 71.3 | 63.7 | 74.8 |
|
Table 3. Accuracy with and without self-attention on JHMDB dataset unit: %
Algorithm | Accuracy |
---|
AOG-Fine[16] | 73.4 | STIP-HoG+HoG[17] | 82.8 | AOG-All[16] | 85.5 | C3D[18] | 86.0 | JDD[19] | 87.4 | MMTSN-RGB+Pose[20] | 91.67 | IDT-FV[19] | 92.0 | IDT-FV+Pose[19] | 92.9 | TSN [21] | 93.8 | DPI+att-DTI[22] | 93.9 | DPI+att-DTIs[22] | 95.8 | AWCN (Ours) | 92.8 | AWCN+self-attention (Ours) | 94.3 |
|
Table 4. Comparison of AWCN and other algorithms on Penn Action dataset unit: %
Algorithm | Accuracy |
---|
P-CNN[7] | 61.1 | FAT[23] | 62.5 | MMTSN-RGB+Pose[20] | 62.86 | STAR-Net[24] | 64.3 | IDT-FV[19] | 65.9 | TS R-CNN[23] | 70.5 | MR-TS R-CNN[23] | 71.1 | GoogLeNet+iTF[25] | 74.5 | AWCN (Ours) | 72.9 | AWCN+self-attention (Ours) | 74.8 |
|
Table 5. Comparison of AWCN and other algorithms on JHMDB dataset unit: %
Algorithm | CS | CV |
---|
STA-LSTM[26] | 73.4 | 81.2 | VA-LSTM[27] | 79.4 | 87.6 | ST-GCN[28] | 81.5 | 88.3 | Two-Stream CNN[29] | 83.2 | 89.3 | CSTA-CNN[30] | 84.9 | 89.9 | HCN[31] | 86.5 | 91.9 | SR-TSL[32] | 84.8 | 92.4 | AWCN (Ours) | 85.6 | 88.9 | AWCN+self-attention (Ours) | 87.3 | 90.1 |
|
Table 6. Comparison of AWCN and other algorithms on NTU RGB-D dataset unit: %