No Reference Video Quality Assessment Based on Spatio-Temporal Features and Attention Mechanism

Ze Zhu; Qingbing Sang; Hao Zhang

doi:10.3788/LOP57.181509

Journals >Laser & Optoelectronics Progress >Volume 57 >Issue 18 >Page 181509 > Article

Laser & Optoelectronics Progress
Vol. 57, Issue 18, 181509 (2020)

No Reference Video Quality Assessment Based on Spatio-Temporal Features and Attention Mechanism

Ze Zhu¹, Qingbing Sang^1、2、*, and Hao Zhang¹

Author Affiliations

¹School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China

²Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Wuxi, Jiangsu 214122, China

show less

DOI: 10.3788/LOP57.181509 Cite this Article Set citation alerts

Ze Zhu, Qingbing Sang, Hao Zhang. No Reference Video Quality Assessment Based on Spatio-Temporal Features and Attention Mechanism[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181509 Copy Citation Text

show less

Fig. 1. Network structure

Download full size

Fig. 2. Schematic of GRU network structure

Download full size

Fig. 3. Attention model

Download full size

Fig. 4. 1^st frame of different distorted videos. (a) Riverbed; (b) sunflower; (c) station; (d) tractor

Download full size

Fig. 5. Flow chart of video data processing

Download full size

Fig. 6. Scatter plot of prediction results on LIVE video library

Download full size

Fig. 7. Relationship curves between number of training sets of different proportions and evaluation results

Download full size

Fig. 8. Scatter plot of prediction results on CSIQ video library

Download full size

Fig. 9. Scatter plot of prediction results on IVP video library

Download full size

Layer name	Output size	Parameter
Conv1,Conv2	24000×48×64	Size: 3×3; filters: 64
Max pooling1	12000×24×64	Size: 2×2; stride: 2×2
Conv3,Conv4	12000×24×128	Size: 3×3; filters: 128
Max pooling2	6000×12×128	Size: 2×2; stride: 2×2
Conv5,Conv6,Conv7	6000×12×256	Size: 3×3; filters: 256
Max pooling3	3000×6×256	Size: 2×2; stride: 2×2
Conv8,Conv9,Conv10	3000×6×512	Size: 3×3; filters: 512
Max pooling4	1500×3×512	Size: 2×2; stride: 2×2
Conv11,Conv12,Conv13	1500×3×512	Size: 3×3; filters: 512
Max pooling5	749×1×512	Size: 2×2; stride: 3×3
GRU	1×1×512	512
Attention	1×512	/
FC	1×1	1

Table 1. Network parameter setting

Algorithm	SROCC	PLCC
PSNR^[23]	0.5398	0.5645
SSIM^[24]	0.7364	0.7470
ST-MAD^[6]	0.8251	0.8332
STRRED^[25]	0.8007	0.8119
FS-MOVIE^[7]	0.8482	0.8636
V-BLIINDS^[4]	0.8377	0.8471
Ours without attention	0.8557	0.8633
Ours with attention	0.8798	0.8910

Table 2. Performance comparison of different algorithms on LIVE video library

Algorithm	Wireless	IP	H.264	MPEG-2
PSNR^[23]	0.6574	0.4167	0.4585	0.3862
SSIM^[24]	0.7289	0.6534	0.7313	0.6684
ST-MAD^[6]	0.8099	0.7758	0.9021	0.8461
STRRED^[25]	0.7857	0.7722	0.8193	0.7193
FS-MOVIE^[7]	0.8139	0.7722	0.8490	0.8609
V-BLIINDS^[4]	0.8455	0.7898	0.8587	0.8377
Ours withoutattention	0.8487	0.8316	0.8468	0.8331
Ours withattention	0.8617	0.8458	0.8585	0.8547

Table 3. Comparison of SROCC values of different algorithms for single distortion type

Algorithm	Wireless	IP	H.264	MPEG-2
PSNR^[23]	0.7058	0.4767	0.5746	0.3986
SSIM^[24]	0.7184	0.7764	0.7420	0.6222
ST-MAD^[6]	0.8591	0.8065	0.8796	0.8560
STRRED^[25]	0.8053	0.8527	0.8141	0.7570
FS-MOVIE^[7]	0.8599	0.8009	0.8765	0.8721
V-BLIINDS^[4]	0.9134	0.9020	0.9038	0.8699
Ours withoutattention	0.9069	0.9099	0.8766	0.8745
Ours withattention	0.9203	0.9177	0.8962	0.8858

Table 4. Comparison of PLCC values of different algorithms for single distortion type

Algorithm	LiveData1		LiveData2		LiveData3		LiveData4		Average
Algorithm	SROCC	PLCC	SROCC	PLCC	SROCC	PLCC	SROCC	PLCC	SROCC	PLCC
Ours withoutattention	0.8478	0.8752	0.8482	0.8672	0.8544	0.8461	0.8724	0.8648	0.8557	0.8633
Ours with attention	0.8693	0.8908	0.8910	0.9004	0.8852	0.8758	0.8735	0.8969	0.8798	0.8910

Table 5. Comparison of final evaluation results on LIVE video library

Algorithm	Time /s
PSNR^[23]	3.09
SSIM^[24]	11.34
ST-MAD^[6]	335.90
STRRED^[25]	54.94
FS-MOVIE^[7]	4444.20
Ours with attention	1291.20

Table 6. Comparison of running time of different methods on “Tractor” video

Algorithm	SROCC	PLCC
PSNR^[23]	0.7253	0.7932
SSIM^[24]	0.8661	0.8517
ST-MAD^[6]	0.8174	0.8266
STRRED^[25]	0.8822	0.8734
FS-MOVIE^[7]	0.8067	0.8053
V-BLIINDS^[4]	0.8351	0.8449
Ours with attention	0.8909	0.8991

Table 7. Performance comparison of different algorithms on CSIQ video library

Algorithm	SROCC	PLCC
PSNR^[23]	0.7064	0.7299
SSIM^[24]	0.7694	0.7667
ST-MAD^[6]	0.8235	0.8284
STRRED^[25]	0.8761	0.8853
FS-MOVIE^[7]	0.8177	0.8359
V-BLIINDS^[4]	0.8552	0.8441
Ours with attention	0.9064	0.9135

Table 8. Performance comparison of different algorithms on IVP video library

Ze Zhu, Qingbing Sang, Hao Zhang. No Reference Video Quality Assessment Based on Spatio-Temporal Features and Attention Mechanism[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181509

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information