• Laser & Optoelectronics Progress
  • Vol. 58, Issue 20, 2010020 (2021)
Ce Zhang1、2 and Weilan Wang1、*
Author Affiliations
  • 1Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou, Gansu 730030, China
  • 2School of Mathematics and Information Engineering, Chongqing University of Education, Chongqing 400065, China
  • show less
    DOI: 10.3788/LOP202158.2010020 Cite this Article Set citation alerts
    Ce Zhang, Weilan Wang. Character Segmentation for Historical Uchen Tibetan Document Based on Structure Attributes[J]. Laser & Optoelectronics Progress, 2021, 58(20): 2010020 Copy Citation Text show less
    Syllable of the Tibetan. (a) Structure of the Tibetan syllable; (b) example of the Tibetan syllable; (c) examples of Tibetan transliteration of Sanskrit
    Fig. 1. Syllable of the Tibetan. (a) Structure of the Tibetan syllable; (b) example of the Tibetan syllable; (c) examples of Tibetan transliteration of Sanskrit
    Original image of the historical Tibetan document
    Fig. 2. Original image of the historical Tibetan document
    Tibetan document after pre-processing
    Fig. 3. Tibetan document after pre-processing
    Vertical segmentation process by projection of the historical Tibetan document. (a) Document line and its vertical projection; (b) character blocks in rectangular area
    Fig. 4. Vertical segmentation process by projection of the historical Tibetan document. (a) Document line and its vertical projection; (b) character blocks in rectangular area
    Part of character block dataset
    Fig. 5. Part of character block dataset
    Examples of the character segmentation challenges. (a) Segmentation challenges above the baseline; (b) segmentation challenges below the baseline
    Fig. 6. Examples of the character segmentation challenges. (a) Segmentation challenges above the baseline; (b) segmentation challenges below the baseline
    Flow chart of the character segmentation for historical Tibetan document
    Fig. 7. Flow chart of the character segmentation for historical Tibetan document
    Flow chart of the local baseline detection
    Fig. 8. Flow chart of the local baseline detection
    Touching type above the baseline
    Fig. 9. Touching type above the baseline
    Schematic diagram of coordinate system and segmentation direction
    Fig. 10. Schematic diagram of coordinate system and segmentation direction
    Examples of multipath segmentation. (a) Combination example; (b) marked skeleton diagram; (c) segmentation path
    Fig. 11. Examples of multipath segmentation. (a) Combination example; (b) marked skeleton diagram; (c) segmentation path
    Stroke types and geometric characteristics above the baseline
    Fig. 12. Stroke types and geometric characteristics above the baseline
    Broken strokes type below the baseline. (a) Cross left and right; (b) cross up and down; (c) separate up and down; (d) contain
    Fig. 13. Broken strokes type below the baseline. (a) Cross left and right; (b) cross up and down; (c) separate up and down; (d) contain
    Examples of strokes attribution classification. (a) With no stroke above the baseline and with no broken stroke below the baseline; (b) with strokes above the baseline and with no broken stroke below the baseline; (c) with no stroke above the baseline and with no broken stroke below the baseline; (d) with strokes above the baseline and with broken strokes below the baseline
    Fig. 14. Examples of strokes attribution classification. (a) With no stroke above the baseline and with no broken stroke below the baseline; (b) with strokes above the baseline and with no broken stroke below the baseline; (c) with no stroke above the baseline and with no broken stroke below the baseline; (d) with strokes above the baseline and with broken strokes below the baseline
    Process of local baseline detection and horizontal segmentation of character block. (a) Character blocks with syllable points; (b) character blocks with no syllable point and with no stroke above the baseline; (c) character blocks with no syllable point and with strokes above the baseline
    Fig. 15. Process of local baseline detection and horizontal segmentation of character block. (a) Character blocks with syllable points; (b) character blocks with no syllable point and with no stroke above the baseline; (c) character blocks with no syllable point and with strokes above the baseline
    Touching stroke and type detection above the baseline
    Fig. 16. Touching stroke and type detection above the baseline
    Character segmentation with a touching stroke. (a) Character direction is D1; (b) character direction is D2
    Fig. 17. Character segmentation with a touching stroke. (a) Character direction is D1; (b) character direction is D2
    Character segmentation with multiple touching strokes
    Fig. 18. Character segmentation with multiple touching strokes
    Character segmentation with multiple crossing strokes
    Fig. 19. Character segmentation with multiple crossing strokes
    Statistical results of broken strokes below the baseline. (a) Cross left and right; (b) cross up and down; (c) separate up and down; (d) contain
    Fig. 20. Statistical results of broken strokes below the baseline. (a) Cross left and right; (b) cross up and down; (c) separate up and down; (d) contain
    Attribution based on the horizontal distance of the centroid. (a) Character block; (b) centroid of strokes after attribution
    Fig. 21. Attribution based on the horizontal distance of the centroid. (a) Character block; (b) centroid of strokes after attribution
    Attribution analysis of the stroke. (a) No. 1; (b) No. 9
    Fig. 22. Attribution analysis of the stroke. (a) No. 1; (b) No. 9
    Attribution analysis of “ ” stroke
    Fig. 23. Attribution analysis of “ ” stroke
    Attribution analysis of both “ ” and “ ” stroks
    Fig. 24. Attribution analysis of both “ ” and “ ” stroks
    Results of character segmentation. (a) Character block; (b) block after character segmentation
    Fig. 25. Results of character segmentation. (a) Character block; (b) block after character segmentation
    Wrong character segmentation caused by strokes attribution. (a)Character block; (b) local baseline and horizontal segmentation;(c) broken stroke mark; (d) result of character segmentation
    Fig. 26. Wrong character segmentation caused by strokes attribution. (a)Character block; (b) local baseline and horizontal segmentation;(c) broken stroke mark; (d) result of character segmentation
    Wrong character segmentation caused by the baseline detection. (a) Character block; (b) horizontal projection; (c) Hough straight line detection; (d) local baseline; (e) result of character segmentation
    Fig. 27. Wrong character segmentation caused by the baseline detection. (a) Character block; (b) horizontal projection; (c) Hough straight line detection; (d) local baseline; (e) result of character segmentation
    LabelDescription
    C1overlapping strokes above the baseline
    C2crossing strokes above the baseline
    C3touching strokes above the baseline
    C4broken strokes above the baseline
    C5overlapping strokes below the baseline
    C6touching strokes below the baseline
    C7broken strokes below the baseline
    Table 1. Classification of the character segmentation challenges
    NCSCNTSCNRecall/%
    10935410960399.77
    Table 2. Correct segmentation data of character block dataset
    NCSCNTCNTSCNRecall/%NPrecision/%NF-Measure/%
    17680218398718037996.0998.0297.05
    Table 3. Data of the correct segmentation in the character segmentation stage
    Character segmentation stepsNWSCNProportion/%NError/%
    Build character block dataset2493.460.14
    Detect the local baseline and horizontal segmentation96213.390.52
    Detection of touching strokes type2673.720.15
    Segmentation of touching strokes250.350.01
    Strokes attribution568279.083.09
    Table 4. NError for each step during character segmentation
    NCSCNTCCNTSCNRecall/%NPrecision/%NF-Measure/%
    19922020640520279796.5298.2497.37
    Table 5. Correctly segmented data
    Ce Zhang, Weilan Wang. Character Segmentation for Historical Uchen Tibetan Document Based on Structure Attributes[J]. Laser & Optoelectronics Progress, 2021, 58(20): 2010020
    Download Citation