• Journal of Terahertz Science and Electronic Information Technology
  • Vol. 21, Issue 3, 378 (2023)
CHEN Yangjian1、* and WEN Qiuhua2
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • show less
    DOI: 10.11805/tkyda2020457 Cite this Article
    CHEN Yangjian, WEN Qiuhua. Micro-blog hot topic detection method based on improved K-means[J]. Journal of Terahertz Science and Electronic Information Technology , 2023, 21(3): 378 Copy Citation Text show less

    Abstract

    Micro-blog text data is high-dimensional, bearing the obvious features of synonymy and polysemy. Traditional topic detection method based on Vector Space Model(VSM) combined with K-means has some problems such as low accuracy, complex calculation, and being difficult to determine the center of clustering. A Relevance Vector Machine(RVM) optimized VSM method is proposed to realize the text vectorization. Firstly, the dimension of VSM feature vector is reduced automatically by using the adaptive feature selection ability of RVM, and then Principal Component Analysis(PCA) is applied to determine the cluster center of K-means clustering algorithm. K-means algorithm is employed to get the clustering results. Finally, according to the number of micro-blog forwarding and comments, the topic with the largest heat index is the current hot topic. The results show that compared with two traditional methods, the accuracy of the proposed method is improved by 7.3% and 1.1%, and the real-time performance is improved by 45% and 53%, respectively.
    CHEN Yangjian, WEN Qiuhua. Micro-blog hot topic detection method based on improved K-means[J]. Journal of Terahertz Science and Electronic Information Technology , 2023, 21(3): 378
    Download Citation