Optical Music Recognition Method Combining Multi-Scale Residual Convolutional Neural Network and Bi-Directional Simple Recurrent Units

Qiong Wu; Qiang Li; Xin Guan

doi:10.3788/LOP57.081006

Journals >Laser & Optoelectronics Progress >Volume 57 >Issue 8 >Page 081006 > Article

Laser & Optoelectronics Progress
Vol. 57, Issue 8, 081006 (2020)

Optical Music Recognition Method Combining Multi-Scale Residual Convolutional Neural Network and Bi-Directional Simple Recurrent Units

Qiong Wu, Qiang Li, and Xin Guan^*

Author Affiliations

School of Microelectronics, Tianjin University, Tianjin 300072, China

show less

DOI: 10.3788/LOP57.081006 Cite this Article Set citation alerts

Qiong Wu, Qiang Li, Xin Guan. Optical Music Recognition Method Combining Multi-Scale Residual Convolutional Neural Network and Bi-Directional Simple Recurrent Units[J]. Laser & Optoelectronics Progress, 2020, 57(8): 081006 Copy Citation Text

show less

Fig. 1. Schematic diagram of MF-RC-BiSRU

Download full size

Fig. 2. Schematic diagram of residual structure

Download full size

Fig. 3. Schematic diagram of multi-scale feature fusion

Download full size

Fig. 4. Structure of SRU

Download full size

Fig. 5. Schematic diagram of BiSRU

Download full size

Fig. 6. Difficulties of note recognition in music score

Download full size

Fig. 7. Three methods of data processing to simulate unsatisfactory music image. (a) Original incipit; (b) incipit of white Gaussian noise added; (c) incipit of Perlin noise added; (d) incipit of elastic transformations added

Download full size

Fig. 8. Comparison of training loss and accuracy for C-BiLSTM and RC-BiLSTM networks. (a) Comparison of training loss; (b) comparison of symbol error rate

Download full size

Fig. 9. Comparison of features in different convolution layers. (a) Original incipit; (b) shallow feature map C₁; (c) deeper feature map C₃; (d) deepest feature map C₅; (e) multi-scale feature fusion map F₄

Download full size

Fig. 10. Comparison of the symbol error rates in the different networks

Download full size

Fig. 11. Comparison of MF-RC-BiSRU and MF-RC-BiLSTM. (a) Comparison of training loss; (b) comparison of symbol error rates

Download full size

Fig. 12. Test results of the same incipit in four different networks.(a) Original incipit; (b) C-BiLSTM; (c) RC-BiLSTM; (d) MF-RC-BiLSTM; (e) MF-RC-BiSRU

Download full size

Fig. 13. Comparison of loss in different methods

Download full size

Input(128×weight×1)
Part	Layer	Parameters
Featureextraction	Residual_Conv_1	(3,3,32)
	Max_Pool	(2,2,32)
	Residual_Conv_2	(3,3,64)
	Max_Pool	(2,2,64)
	Residual_Conv_3	(3,3,128)
	Max_Pool	(2,2,128)
	Residual_Conv_4	(3,3,256)
	Max_Pool	(2,2,256)
	Residual_Conv_5	(3,3,256)
	Max_Pool	(2,2,256)
Note recognitionand classification	BiSRU	512
	BiSRU	512
	CTC	1780