Multi-Source Text Topic Model Based on DMA and Feature Division

Given the poor performance exhibited by the existing topic models for mining information on multi-source text data sets,a multi-source text topic model based on Dirichlet Multinomial Allocation(DMA) and feature division is designed.This model relaxes the restrictions on the number of pre-input topic...

Full description

Bibliographic Details
Published in:Jisuanji gongcheng
Main Author: XU Weijia, QIN Yongbin, HUANG Ruizhang, CHEN Yanping
Format: Article
Language:English
Published: Editorial Office of Computer Engineering 2021-07-01
Subjects:
Online Access:https://www.ecice06.com/fileup/1000-3428/PDF/20210708.pdf
Description
Summary:Given the poor performance exhibited by the existing topic models for mining information on multi-source text data sets,a multi-source text topic model based on Dirichlet Multinomial Allocation(DMA) and feature division is designed.This model relaxes the restrictions on the number of pre-input topics,assigns a special topic distribution parameter for each data source,and automatically estimates the number of topics for each data source by using the Gibbs sampling algorithm.In addition,the model assigns a specific noise word distribution parameter and topic-word distribution parameter for each data source.The feature words and noise words of each data source are distinguished by using the feature categorization method,and the word features of each data source are learnt to avoid the influence of the noise word set on model clustering.Experimental results show that compared with the existing topic models,the proposed model can keep the unique word features of each data source,and has better topic discovery performance as well as improved robustness.
ISSN:1000-3428