Lightweight video super-resolution based on hybrid spatio-temporal convolution

Zhenping XIA; Hao CHEN; Yuning ZHANG; Cheng CHENG; Fuyuan HU

doi:10.37188/OPE.20243216.2564

Journals >Optics and Precision Engineering >Volume 32 >Issue 16 >Page 2564 > Article

Optics and Precision Engineering
Vol. 32, Issue 16, 2564 (2024)

Lightweight video super-resolution based on hybrid spatio-temporal convolution

Zhenping XIA^1,3,*, Hao CHEN¹, Yuning ZHANG^2,4, Cheng CHENG^1,3, and Fuyuan HU^1,3

Author Affiliations

¹School of Electronic & Information Engineering， Suzhou University of Science and Technology， Suzhou25009， China

²Display R&D Centre， School of Electronic Science & Engineering， Southeast University， Nanjing10096， China

³Jiangsu Industrial Intelligent and Low-carbon Technology Engineering Center， Suzhou215009， China

⁴Shi-Cheng Laboratory for Information Display and Visualization， Nanjing210013， China

show less

DOI: 10.37188/OPE.20243216.2564 Cite this Article

Zhenping XIA, Hao CHEN, Yuning ZHANG, Cheng CHENG, Fuyuan HU. Lightweight video super-resolution based on hybrid spatio-temporal convolution[J]. Optics and Precision Engineering, 2024, 32(16): 2564 Copy Citation Text

show less

Fig. 1. Overall network structure

Download full size | View in the Article

Fig. 2. Motion compensation structure

Download full size | View in the Article

Fig. 3. Hybrid Spatial-Temporal Convolution

Download full size | View in the Article

Fig. 4. 2D Spatial Convolution

Download full size | View in the Article

Fig. 5. Similarity-based feature selection

Download full size | View in the Article

Fig. 6. Visual results of our network and its variants

Download full size | View in the Article

Fig. 7. Reconstruction visual comparisons of the state-of-the-art algorithms and proposed network on three datasets for ×4 SR

Download full size | View in the Article

Fig. 8. [in Chinese]

Download full size | View in the Article

模块	函数名	卷积核大小
运动补偿	C_f（·）	3×3×128
	C_g （·）	3×3×128
	DConv	3×3×128
时空特征提取	C_a（·）	3×3×128
	C_SC（·）	3×3×128
	C_TC（·）	3×3×3×128
	C_fuse（·）	3×3×128
选择性特征融合	$θ$ （·）	3×3×128
	$ϕ$ （·）	1×1×128
	C_e（·）	1×1×128
	Up sampling	3×3×48

Table 1. Architecture of network

View in the Article

模型	三维卷积	二维空间卷积	特征选择模块	PSNR	SSIM
TC-VSR	√			29.47	0.869 9
Deep-TC-VSR	√			29.79	0.874 8
S-TC-VSR	√		√	29.72	0.873 4
S-SC-VSR		√	√	29.61	0.871 3
HTSC-VSR	√	√		29.59	0.873 5
S-HTSC-VSR（Ours）	√	√	√	30.51	0.880 9

Table 2. Quantitative comparison of different activation functions on the SPMCS-11 dataset

View in the Article

深度	宽度	参数量/M	PSNR/dB	SSIM
8	64	3.9	30.21	0.870 1
8	128	5.2	30.27	0.874 4
10	64	6.8	30.43	0.875 2
10	128	9.7	30.51	0.880 9

Table 3. Network performance of different widths and depths

View in the Article

	MSE loss	L1 loss	Charbonnier loss
PSNR	27.28	27.36	27.43

Table 4. Average value of all video frames of different Loss Functions on the Vid4 dataset

View in the Article

片段名	Bicubic	RCAN^［25］	DUF^［14］	TDAN^［21］	VSR-Transformer^［26］	BasicVSR++^［27］	Ours
Calendar	20.39/0.572 0	22.31/0.724 8	24.04/0.811 0	23.20/0.768 9	24.14/0.815 7	24.23/0.820 9	24.20/0.821 2
City	25.16/0.602 8	26.07/0.693 8	28.27/0.831 3	27.18/0.771 6	27.87/0.811 4	28.01/0.813 7	28.03/0.814 1
Foliage	23.47/0.566 6	24.69/0.662 8	26.41/0.770 9	25.64/0.728 4	26.29/0.761 3	26.34/0.765 4	26.39/0.766 5
Walk	26.10/0.797 4	28.64/0.871 8	30.30/0.914 1	29.80/0.894 0	30.91/0.910 9	31.11/0.915 4	31.09/0.915 7
Average	23.78/0.634 7	25.43/0.738 3	27.26/0.831 8	26.46/0.790 7	27.30/0.824 8	27.42/0.828 9	27.43/0.829 4

Table 5. Quantitative comparisons of different algorithms for scale factor ×4 on Vid4 dataset（PSNR（dB）/SSIM）

View in the Article

片段名	Bicubic	RCAN^［25］	DUF^［14］	TDAN^［21］	VSR-Transformer^［26］	BasicVSR++^［27］	Ours
Car_05	27.75/0.782 5	29.84/0.848 3	30.77/0.870 5	30.59/0.865 3	32.13/0.903 2	32.31/0.905 4	32.42/0.906 3
hdclub_003	19.42/0.486 3	20.39/0.610 0	22.06/0.742 9	21.34/0.687 9	22.11/0.738 7	22.19/0.744 3	22.17/0.741 9
hitachi_isee5	19.61/0.593 8	23.58/0.837 1	25.75/0.892 7	24.59/0.856 7	26.50/0.906 9	26.73/0.909 7	26.74/0.912 3
hk004_001	28.54/0.800 3	31.72/0.862 8	32.96/0.898 4	32.27/0.882 5	33.48/0.904 6	33.59/0.905 1	33.66/0.904 5
HKVTG_004	27.46/0.683 1	28.77/0.765 0	29.15/0.785 5	29.11/0.778 8	29.57/0.798 3	29.60/0.798 7	29.55/0.801 3
jvc_009	25.40/0.755 8	28.29/0.872 2	29.17/0.895 9	28.90/0.883 2	30.46/0.919 5	30.74/0.921 1	30.91/0.921 6
NYVTG_006	28.45/0.801 4	30.99/0.886 0	32.32/0.905 8	31.90/0.899 6	33.32/0.925 1	33.56/0.926 9	34.11/0.927 4
PRVTG_012	25.63/0.713 6	26.63/0.781 1	27.35/0.816 4	27.16/0.805 6	27.67/0.825 3	27.79/0.828 1	27.84/0.827 4
RMVTG_011	23.96/0.657 3	26.05/0.757 4	27.53/0.811 5	26.95/0.792 4	27.71/0.819 7	27.81/0.823 4	27.94/0.825 2
veni3_011	29.47/0.897 9	34.54/0.962 5	34.64/0.967 6	34.68/0.964 5	36.53/0.974 5	36.57/0.974 8	37.16/0.975 2
veni5_015	27.41/0.848 3	31.01/0.926 2	31.89/0.936 7	31.30/0.927 5	32.77/0.944 9	33.17/0.947 3	33.12/0.946 6
Average	25.73/0.739 1	28.35/0.828 1	29.42/0.865 9	28.98/0.849 5	30.20/0.878 2	30.37/0.880 4	30.51/0.880 9

Table 6. Quantitative comparisons of different algorithms for scale factor ×4 on SPMCS-11 dataset（PSNR（dB）/SSIM）

View in the Article

算法	慢速运动	中速运动	快速运动	Average
Bicubic	29.34/0.833 0	31.29/0.870 8	34.07/0.905 0	31.32/0.868 4
RCAN^［25］	32.92/0.902 8	35.33/0.926 5	38.45/0.945 3	35.32/0.924 5
DUF^［14］	33.38/0.910 7	36.69/0.944 2	38.86/0.950 8	36.35/0.938 3
TDAN^［21］	33.17/0.906 5	36.05/0.936 9	38.70/0.949 1	35.87/0.932 5
VSR-Transformer^［26］	34.43/0.923 2	37.69/0.951 7	40.26/0.961 3	37.42/0.947 3
BasicVSR++^［27］	34.58/0.925 6	37.75/0.952 7	40.49/0.962 4	37.52/0.948 6
Ours	34.53/0.924 6	37.81/0.953 5	40.56/0.963 3	37.56/0.949 0
片段数量	1 616	4 983	1 225	7 824
平均流大小	0.6	2.5	8.3	3.0

Table 7. Quantitative comparisons of different algorithms for scale factor ×4 on Vimeo-90K-T dataset（PSNR（dB）/SSIM）

View in the Article

评估指标	Bicubic	RCAN^［25］	TDAN^［21］	BasicVSR++^［27］	Ours
NIQE↓	7.58	6.29	6.56	6.11	6.05
SSEQ↓	54.40	46.32	44.26	41.17	40.59

Table 8. Quantitative comparisons on the real-world dataset

View in the Article

算法	PSNR/dB	SSIM	参数量/M	FLOPs/10⁹	平均运行时间/s
RCAN^［25］	28.35	0.828 1	15.6	261.46	1.586
DUF^［14］	29.42	0.865 9	5.8	92.97	0.573
3DSRNet^［13］	28.98	0.849 5	15.9	127.49	0.778
VSR-Transformer^［26］	30.20	0.878 2	43.8	834.01	1.153
BasicVSR++^［27］	30.37	0.880 4	6.4	11.07	0.067
Ours	30.51	0.880 9	9.7	19.04	0.115

Table 9. Average running time on SPMCS-11 dataset for ×4 SR

Zhenping XIA, Hao CHEN, Yuning ZHANG, Cheng CHENG, Fuyuan HU. Lightweight video super-resolution based on hybrid spatio-temporal convolution[J]. Optics and Precision Engineering, 2024, 32(16): 2564

Download Citation

Tools

Save the article for my favorites

Paper Information

微信扫一扫：分享

微信扫一扫：分享