Fig. 1. Syllable of the Tibetan. (a) Structure of the Tibetan syllable; (b) example of the Tibetan syllable; (c) examples of Tibetan transliteration of Sanskrit
Fig. 2. Original image of the historical Tibetan document
Fig. 3. Tibetan document after pre-processing
Fig. 4. Vertical segmentation process by projection of the historical Tibetan document. (a) Document line and its vertical projection; (b) character blocks in rectangular area
Fig. 5. Part of character block dataset
Fig. 6. Examples of the character segmentation challenges. (a) Segmentation challenges above the baseline; (b) segmentation challenges below the baseline
Fig. 7. Flow chart of the character segmentation for historical Tibetan document
Fig. 8. Flow chart of the local baseline detection
Fig. 9. Touching type above the baseline
Fig. 10. Schematic diagram of coordinate system and segmentation direction
Fig. 11. Examples of multipath segmentation. (a) Combination example; (b) marked skeleton diagram; (c) segmentation path
Fig. 12. Stroke types and geometric characteristics above the baseline
Fig. 13. Broken strokes type below the baseline. (a) Cross left and right; (b) cross up and down; (c) separate up and down; (d) contain
Fig. 14. Examples of strokes attribution classification. (a) With no stroke above the baseline and with no broken stroke below the baseline; (b) with strokes above the baseline and with no broken stroke below the baseline; (c) with no stroke above the baseline and with no broken stroke below the baseline; (d) with strokes above the baseline and with broken strokes below the baseline
Fig. 15. Process of local baseline detection and horizontal segmentation of character block. (a) Character blocks with syllable points; (b) character blocks with no syllable point and with no stroke above the baseline; (c) character blocks with no syllable point and with strokes above the baseline
Fig. 16. Touching stroke and type detection above the baseline
Fig. 17. Character segmentation with a touching stroke. (a) Character direction is D1; (b) character direction is D2
Fig. 18. Character segmentation with multiple touching strokes
Fig. 19. Character segmentation with multiple crossing strokes
Fig. 20. Statistical results of broken strokes below the baseline. (a) Cross left and right; (b) cross up and down; (c) separate up and down; (d) contain
Fig. 21. Attribution based on the horizontal distance of the centroid. (a) Character block; (b) centroid of strokes after attribution
Fig. 22. Attribution analysis of the stroke. (a) No. 1; (b) No. 9
Fig. 23. Attribution analysis of “ ” stroke Fig. 24. Attribution analysis of both “ ” and “ ” stroks Fig. 25. Results of character segmentation. (a) Character block; (b) block after character segmentation
Fig. 26. Wrong character segmentation caused by strokes attribution. (a)Character block; (b) local baseline and horizontal segmentation;(c) broken stroke mark; (d) result of character segmentation
Fig. 27. Wrong character segmentation caused by the baseline detection. (a) Character block; (b) horizontal projection; (c) Hough straight line detection; (d) local baseline; (e) result of character segmentation
Label | Description |
---|
C1 | overlapping strokes above the baseline | C2 | crossing strokes above the baseline | C3 | touching strokes above the baseline | C4 | broken strokes above the baseline | C5 | overlapping strokes below the baseline | C6 | touching strokes below the baseline | C7 | broken strokes below the baseline |
|
Table 1. Classification of the character segmentation challenges
NCSC | NTSC | NRecall/% |
---|
109354 | 109603 | 99.77 |
|
Table 2. Correct segmentation data of character block dataset
NCSC | NTC | NTSC | NRecall/% | NPrecision/% | NF-Measure/% |
---|
176802 | 183987 | 180379 | 96.09 | 98.02 | 97.05 |
|
Table 3. Data of the correct segmentation in the character segmentation stage
Character segmentation steps | NWSC | NProportion/% | NError/% |
---|
Build character block dataset | 249 | 3.46 | 0.14 | Detect the local baseline and horizontal segmentation | 962 | 13.39 | 0.52 | Detection of touching strokes type | 267 | 3.72 | 0.15 | Segmentation of touching strokes | 25 | 0.35 | 0.01 | Strokes attribution | 5682 | 79.08 | 3.09 |
|
Table 4. NError for each step during character segmentation
NCSC | NTCC | NTSC | NRecall/% | NPrecision/% | NF-Measure/% |
---|
199220 | 206405 | 202797 | 96.52 | 98.24 | 97.37 |
|
Table 5. Correctly segmented data