Efficient 3D dense residual network and its application in human action recognition

Li Lianghua; Wang Yongxiong

doi:10.12086/oee.2020.190139

Abstract

In view of the problem that 3D-CNN can better extract the spatio-temporalfeatures in video, but it requiresa high amount of computation and memory, this paper designs an efficient 3D convolutional block to replace the 3×3×3 convolutional layer with a high amount of computation, and then proposes a 3D-efficient dense residual networks (3D-EDRNs) integrating 3D convolutional blocks for human action recognition. The efficient 3D convolu-tional block is composed of 1×3×3 convolutional layers for obtaining spatial features of video and 3×1×1 convolu-tional layers for obtaining temporal features of video. Efficient 3D convolutional blocks are combined in multiple lo-cations of dense residual network, which not only takes advantage of the advantages of easy optimization of residual blocks and feature reuse of dense connected network, but also can shorten the training time and improve the effi-ciency and performance of spatial-temporal feature extraction of the network. In the classical data set UCF101, HMDB51 and the dynamic multi-view complicated 3D database of human activity (DMV action3D), it is verified that the 3D-EDRNs combined with 3D convolutional block can significantly reduce the complexity of the model, effec-tively improve the classification performance of the network, and have the advantages of less computational re-source demand, small number of parameters and short training time.

微信扫一扫：分享

微信扫一扫：分享