• Journal of Atmospheric and Environmental Optics
  • Vol. 12, Issue 3, 230 (2017)
Lulu HU1、*, Xiaoqin LIU1, and Kai SUN2
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • show less
    DOI: 10.3969/j.issn.1673-6141.2017.03.009 Cite this Article
    HU Lulu, LIU Xiaoqin, SUN Kai. Method of Web Page Text Extraction Based on Text Feature and Page Structure[J]. Journal of Atmospheric and Environmental Optics, 2017, 12(3): 230 Copy Citation Text show less
    References

    [1] Liu L, Pu C. XWRAP: an XML 2 enable wrapper constructionsystem for the Web information source [C]//Proceedings of the 16th IEEE International Conference onData Engineering, 2000: 611-620.

    [2] Ma Ling, Goharian N, Chowdhury A,et al. Extracting unstructured data from template generated Web documents [C]//Proceedings of the 12th International Conference on Information and Knowledge anagement, 2003: 512-515.

    [3] Mei Xue, Cheng Xueqi, Guo Yan,et al. Fully automatic Wrapper generation for web information extraction [J]. Journal of Chinese Information Processing, 2008, 22(1): 22-29(in Chinese).

    [4] Sun Chengjie, Guan Yi. A statistical approach for content extraction from web page [J].Journal of Chinese Information Processing, 2004, 18(5): 17-22(in Chinese).

    [5] Sun Hao, Dong Shoubin. Adaptive approach for content extraction based on tag density [J].Journal of Zhengzhou University, 2009, 41(1): 44-47(in Chinese).

    [6] An Zengwen, Wang Chao, Xu Jiefeng. An approach based on machine learning for information extraction method [J].Microcomputer & Its Applications, 2010(12): 4-6(in Chinese).

    [7] You Guirong, Lu Yuchang. Extraction of topical information from Chinese web page based on the statistic and machine learning [J].Journal of Fujian Commercial College, 2009, 4(2): 68-72(in Chinese).

    HU Lulu, LIU Xiaoqin, SUN Kai. Method of Web Page Text Extraction Based on Text Feature and Page Structure[J]. Journal of Atmospheric and Environmental Optics, 2017, 12(3): 230
    Download Citation