• Laser & Optoelectronics Progress
  • Vol. 58, Issue 14, 1410006 (2021)
Yuanyuan Chen1, Weilan Wang2、*, Huaming Liu3, Zhengqi Cai1, and Penghai Zhao2
Author Affiliations
  • 1College of Mathematics and Computer Science, Northwest Minzu University, Lanzhou, Gansu 730030, China
  • 2Key Laboratory of China’s Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou, Gansu 730030, China
  • 3College of Computer and Information Engineering, Fuyang Normal University, Fuyang, Anhui 236041, China
  • show less
    DOI: 10.3788/LOP202158.1410006 Cite this Article Set citation alerts
    Yuanyuan Chen, Weilan Wang, Huaming Liu, Zhengqi Cai, Penghai Zhao. Layout Segmentation and Description of Tibetan Document Images Based on Adaptive Run Length Smoothing Algorithm[J]. Laser & Optoelectronics Progress, 2021, 58(14): 1410006 Copy Citation Text show less

    Abstract

    Layout segmentation is an important basic step in the process of document image analysis and recognition. In order to explore a suitable method for layout segmentation and description of Tibetan document images, a research method based on the adaptive run length smoothing algorithm is proposed. Firstly, according to the layout structure of Tibetan document images, K-means clustering analysis is used to get the run length threshold suitable for the layout, smooth the run length, find the connected component, and realize the layout segmentation. Then, according to the external contour characteristics of each layout element, the text area and non-text area are simply distinguished. Finally, the text area is recognized by a Tibetan text recognizer, and then the extensible markup language is used to record layout information and realize layout description. Experiments on Tibetan primary and secondary school teaching materials and stereotyped Tibetan document images show that this method can achieve good layout analysis results.
    Yuanyuan Chen, Weilan Wang, Huaming Liu, Zhengqi Cai, Penghai Zhao. Layout Segmentation and Description of Tibetan Document Images Based on Adaptive Run Length Smoothing Algorithm[J]. Laser & Optoelectronics Progress, 2021, 58(14): 1410006
    Download Citation