• Laser & Optoelectronics Progress
  • Vol. 57, Issue 8, 081006 (2020)
Qiong Wu, Qiang Li, and Xin Guan*
Author Affiliations
  • School of Microelectronics, Tianjin University, Tianjin 300072, China
  • show less
    DOI: 10.3788/LOP57.081006 Cite this Article Set citation alerts
    Qiong Wu, Qiang Li, Xin Guan. Optical Music Recognition Method Combining Multi-Scale Residual Convolutional Neural Network and Bi-Directional Simple Recurrent Units[J]. Laser & Optoelectronics Progress, 2020, 57(8): 081006 Copy Citation Text show less
    Schematic diagram of MF-RC-BiSRU
    Fig. 1. Schematic diagram of MF-RC-BiSRU
    Schematic diagram of residual structure
    Fig. 2. Schematic diagram of residual structure
    Schematic diagram of multi-scale feature fusion
    Fig. 3. Schematic diagram of multi-scale feature fusion
    Structure of SRU
    Fig. 4. Structure of SRU
    Schematic diagram of BiSRU
    Fig. 5. Schematic diagram of BiSRU
    Difficulties of note recognition in music score
    Fig. 6. Difficulties of note recognition in music score
    Three methods of data processing to simulate unsatisfactory music image. (a) Original incipit; (b) incipit of white Gaussian noise added; (c) incipit of Perlin noise added; (d) incipit of elastic transformations added
    Fig. 7. Three methods of data processing to simulate unsatisfactory music image. (a) Original incipit; (b) incipit of white Gaussian noise added; (c) incipit of Perlin noise added; (d) incipit of elastic transformations added
    Comparison of training loss and accuracy for C-BiLSTM and RC-BiLSTM networks. (a) Comparison of training loss; (b) comparison of symbol error rate
    Fig. 8. Comparison of training loss and accuracy for C-BiLSTM and RC-BiLSTM networks. (a) Comparison of training loss; (b) comparison of symbol error rate
    Comparison of features in different convolution layers. (a) Original incipit; (b) shallow feature map C1; (c) deeper feature map C3; (d) deepest feature map C5; (e) multi-scale feature fusion map F4
    Fig. 9. Comparison of features in different convolution layers. (a) Original incipit; (b) shallow feature map C1; (c) deeper feature map C3; (d) deepest feature map C5; (e) multi-scale feature fusion map F4
    Comparison of the symbol error rates in the different networks
    Fig. 10. Comparison of the symbol error rates in the different networks
    Comparison of MF-RC-BiSRU and MF-RC-BiLSTM. (a) Comparison of training loss; (b) comparison of symbol error rates
    Fig. 11. Comparison of MF-RC-BiSRU and MF-RC-BiLSTM. (a) Comparison of training loss; (b) comparison of symbol error rates
    Test results of the same incipit in four different networks.(a) Original incipit; (b) C-BiLSTM; (c) RC-BiLSTM; (d) MF-RC-BiLSTM; (e) MF-RC-BiSRU
    Fig. 12. Test results of the same incipit in four different networks.(a) Original incipit; (b) C-BiLSTM; (c) RC-BiLSTM; (d) MF-RC-BiLSTM; (e) MF-RC-BiSRU
    Comparison of loss in different methods
    Fig. 13. Comparison of loss in different methods
    Input(128×weight×1)
    PartLayerParameters
    FeatureextractionResidual_Conv_1(3,3,32)
    Max_Pool(2,2,32)
    Residual_Conv_2(3,3,64)
    Max_Pool(2,2,64)
    Residual_Conv_3(3,3,128)
    Max_Pool(2,2,128)
    Residual_Conv_4(3,3,256)
    Max_Pool(2,2,256)
    Residual_Conv_5(3,3,256)
    Max_Pool(2,2,256)
    Note recognitionand classificationBiSRU512
    BiSRU512
    CTC1780
    Table 1. Structure parameters of the improved network
    NetworkSymbol errorrate /%Sequence errorrate /%
    C-BiLSTM3.248014.3498
    RC-BiLSTM1.84408.1071
    MF-RC-BiLSTM0.33121.4637
    MF-RC-BiSRU0.32341.4571
    Table 2. Comparison of accuracy in different networks
    MethodSymbol errorrate /%Sequenceerror rate /%Time /s
    CNN-STN[15]5.020816.80560.98
    DWD[17]8.781118.56091.21
    MF-RC-BiSRU0.32341.45710.56
    Table 3. Performance comparison of different methods
    Qiong Wu, Qiang Li, Xin Guan. Optical Music Recognition Method Combining Multi-Scale Residual Convolutional Neural Network and Bi-Directional Simple Recurrent Units[J]. Laser & Optoelectronics Progress, 2020, 57(8): 081006
    Download Citation