Human Action Recognition Combining Sequential Dynamic Images and Two-Stream Convolutional Network

Wenqiang Zhang; Zengqiang Wang; Liang Zhang

doi:10.3788/LOP202158.0210007

Journals >Laser & Optoelectronics Progress >Volume 58 >Issue 2 >Page 0210007 > Article

Laser & Optoelectronics Progress
Vol. 58, Issue 2, 0210007 (2021)

Human Action Recognition Combining Sequential Dynamic Images and Two-Stream Convolutional Network

Wenqiang Zhang, Zengqiang Wang, and Liang Zhang^*

Author Affiliations

Tianjin Key Laboratory of Advanced Signal and Image Processing, Civil Aviation University of China, Tianjin 300300, China

show less

DOI: 10.3788/LOP202158.0210007 Cite this Article Set citation alerts

Wenqiang Zhang, Zengqiang Wang, Liang Zhang. Human Action Recognition Combining Sequential Dynamic Images and Two-Stream Convolutional Network[J]. Laser & Optoelectronics Progress, 2021, 58(2): 0210007 Copy Citation Text

show less

Fig. 1. Overall flow diagram of action representation

Download full size

Fig. 2. Static video frames and corresponding timing dynamic diagrams. (a) Static images; (b) timing dynamic diagrams; (c) optical flow diagrams

Download full size

Fig. 3. TS-CNN network framework

Download full size

Fig. 4. Recognition results of different subsequence lengths

Download full size

Method	Split1	Split2	Split3	Accuracy
SI	84.6	84.9	85.0	84.8
SOF	87.3	89.9	91.0	89.4
FSDI	83.9	83.8	83.1	83.6
BSDI	84.1	83.3	84.3	83.9
SDI	85.7	86.2	85.5	85.8
ESDI	87.2	86.8	87.6	87.2
SI+SOF	93.2	94.0	94.2	93.8
ESDI+SOF	94.8	94.6	95.3	94.9

Table 1. Recognition accuracy of UCF101 dataset with different input modes unit: %

Method	Split1	Split2	Split3	Accuracy
SI	54.8	50.4	49.6	51.6
SOF	64.2	63.6	62.7	63.5
FSDI	50.7	51.4	53.6	51.9
BSDI	51.6	51.5	54.1	52.4
SDI	54.5	52.9	53.7	53.7
ESDI	53.6	55.5	55.6	54.9
SI+SOF	68.7	67.5	68.4	68.2
ESDI+SOF	69.6	71.2	71.6	70.8

Table 2. Recognition accuracy of HMDB51 dataset with different input modes unit: %

Consensus function	UCF101	HMDB51
Max	93.0	69.1
Average	94.9	70.8
Weighted average	93.8	69.7

Table 3. Recognition accuracy of different fusion methods on dataset unit: %

Network structure	UCF101	HMDB51
Resnet101	93.6	68.4
Bn-inception	94.2	68.2
InceptionV3	94.9	70.8

Table 4. Recognition accuracy of different network models on dataset unit: %

Network	UCF101	HMDB51
Spatial stream	84.8	51.4
Temproral stream	89.4	63.5
Original two-stream	88.0	59.4
Ref. [19]	94.0	69.4
Appearance and long-sequential stream	87.2	54.9
Short sequential stream	89.9	64
TS-CNN	94.9	70.8

Table 5. Recognition accuracy of different human behavior recognition models unit: %

Feature extraction	Method	UCF101	HMDB51
Tradition	Ref. [7]	84.8	57.2
Tradition	Ref. [8]	87.9	61.1
Deep learning	Ref. [17]	88.0	59.4
	Ref. [21]	88.6	--
	Ref. [22]	91.5	65.9
	Ref. [23]	93.1	63.3
	Ref. [24]	93.4	66.4
	Ref. [19]	94.0	69.4
	Proposed	94.9	70.8

Table 6. Recognition accuracy of different algorithms unit: %

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information