• Laser & Optoelectronics Progress
  • Vol. 57, Issue 20, 201506 (2020)
Fuzheng Guo, Jun Kong*, and Min Jiang
Author Affiliations
  • International Joint Laboratory for Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, Jiangsu 214122, China
  • show less
    DOI: 10.3788/LOP57.201506 Cite this Article Set citation alerts
    Fuzheng Guo, Jun Kong, Min Jiang. Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features[J]. Laser & Optoelectronics Progress, 2020, 57(20): 201506 Copy Citation Text show less
    RGB images and corresponding skeleton images
    Fig. 1. RGB images and corresponding skeleton images
    Overall network
    Fig. 2. Overall network
    Spatial-temporal feature extracting network with self-attention
    Fig. 3. Spatial-temporal feature extracting network with self-attention
    Adaptive weight computing network
    Fig. 4. Adaptive weight computing network
    Feature fusion and classification
    Fig. 5. Feature fusion and classification
    Accuracy of different weight combinations
    Fig. 6. Accuracy of different weight combinations
    Recognition results of using skeleton features only and fusion features
    Fig. 7. Recognition results of using skeleton features only and fusion features
    Visualization of self-attention on skeleton and RGB images of Golf
    Fig. 8. Visualization of self-attention on skeleton and RGB images of Golf
    Visualization of self-attention on skeleton and RGB images of Baseball swing
    Fig. 9. Visualization of self-attention on skeleton and RGB images of Baseball swing
    Visualization of adaptive weight of Golf, Baseball swing, Walk and Run
    Fig. 10. Visualization of adaptive weight of Golf, Baseball swing, Walk and Run
    ParameterValue
    Loss functionCategorical cross entropy
    OptimizerAdam
    Learning rate0.0001
    Batch_size32
    Number of epoch150
    Table 1. Experimental parameters
    AttentionRGBSkeletonFusion
    Without attention90.383.892.8
    With attention92.185.294.3
    Table 2. Accuracy with and without self-attention on Penn Action dataset unit: %
    AttentionRGBSkeletonFusion
    Without attention69.261.972.9
    With attention71.363.774.8
    Table 3. Accuracy with and without self-attention on JHMDB dataset unit: %
    AlgorithmAccuracy
    AOG-Fine[16]73.4
    STIP-HoG+HoG[17]82.8
    AOG-All[16]85.5
    C3D[18]86.0
    JDD[19]87.4
    MMTSN-RGB+Pose[20]91.67
    IDT-FV[19]92.0
    IDT-FV+Pose[19]92.9
    TSN [21]93.8
    DPI+att-DTI[22]93.9
    DPI+att-DTIs[22]95.8
    AWCN (Ours)92.8
    AWCN+self-attention (Ours)94.3
    Table 4. Comparison of AWCN and other algorithms on Penn Action dataset unit: %
    AlgorithmAccuracy
    P-CNN[7]61.1
    FAT[23]62.5
    MMTSN-RGB+Pose[20]62.86
    STAR-Net[24]64.3
    IDT-FV[19]65.9
    TS R-CNN[23]70.5
    MR-TS R-CNN[23]71.1
    GoogLeNet+iTF[25]74.5
    AWCN (Ours)72.9
    AWCN+self-attention (Ours)74.8
    Table 5. Comparison of AWCN and other algorithms on JHMDB dataset unit: %
    AlgorithmCSCV
    STA-LSTM[26]73.481.2
    VA-LSTM[27]79.487.6
    ST-GCN[28]81.588.3
    Two-Stream CNN[29]83.289.3
    CSTA-CNN[30]84.989.9
    HCN[31]86.591.9
    SR-TSL[32]84.892.4
    AWCN (Ours)85.688.9
    AWCN+self-attention (Ours)87.390.1
    Table 6. Comparison of AWCN and other algorithms on NTU RGB-D dataset unit: %
    Fuzheng Guo, Jun Kong, Min Jiang. Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features[J]. Laser & Optoelectronics Progress, 2020, 57(20): 201506
    Download Citation