A word embedding topic model for topic detection and summary in social networks
The aim of topic detection is to automatically identify the events and hot topics in social networks and continuously track known topics. Applying the traditional methods such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis is difficult given the high dimensionality of mass...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2019-11-01
|
Series: | Measurement + Control |
Online Access: | https://doi.org/10.1177/0020294019865750 |
id |
doaj-3566c940ef5642ada0b127832b87d153 |
---|---|
record_format |
Article |
spelling |
doaj-3566c940ef5642ada0b127832b87d1532020-11-25T03:49:38ZengSAGE PublishingMeasurement + Control0020-29402019-11-015210.1177/0020294019865750A word embedding topic model for topic detection and summary in social networksLei Shi0Gang Cheng1Shang-ru Xie2Gang Xie3School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, ChinaSchool of Earth Sciences and Engineering, Nanjing University, Nanjing, ChinaSchool of Computer Science, North China Institute of Science and Technology, Beijing, ChinaSchool of Big Data and Computer Science, Guizhou Normal University, Guiyang, ChinaThe aim of topic detection is to automatically identify the events and hot topics in social networks and continuously track known topics. Applying the traditional methods such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis is difficult given the high dimensionality of massive event texts and the short-text sparsity problems of social networks. The problem also exists of unclear topics caused by the sparse distribution of topics. To solve the above challenge, we propose a novel word embedding topic model by combining the topic model and the continuous bag-of-words mode (Cbow) method in word embedding method, named Cbow Topic Model (CTM), for topic detection and summary in social networks. We conduct similar word clustering of the target social network text dataset by introducing the classic Cbow word vectorization method, which can effectively learn the internal relationship between words and reduce the dimensionality of the input texts. We employ the topic model-to-model short text for effectively weakening the sparsity problem of social network texts. To detect and summarize the topic, we propose a topic detection method by leveraging similarity computing for social networks. We collected a Sina microblog dataset to conduct various experiments. The experimental results demonstrate that the CTM method is superior to the existing topic model method.https://doi.org/10.1177/0020294019865750 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Lei Shi Gang Cheng Shang-ru Xie Gang Xie |
spellingShingle |
Lei Shi Gang Cheng Shang-ru Xie Gang Xie A word embedding topic model for topic detection and summary in social networks Measurement + Control |
author_facet |
Lei Shi Gang Cheng Shang-ru Xie Gang Xie |
author_sort |
Lei Shi |
title |
A word embedding topic model for topic detection and summary in social networks |
title_short |
A word embedding topic model for topic detection and summary in social networks |
title_full |
A word embedding topic model for topic detection and summary in social networks |
title_fullStr |
A word embedding topic model for topic detection and summary in social networks |
title_full_unstemmed |
A word embedding topic model for topic detection and summary in social networks |
title_sort |
word embedding topic model for topic detection and summary in social networks |
publisher |
SAGE Publishing |
series |
Measurement + Control |
issn |
0020-2940 |
publishDate |
2019-11-01 |
description |
The aim of topic detection is to automatically identify the events and hot topics in social networks and continuously track known topics. Applying the traditional methods such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis is difficult given the high dimensionality of massive event texts and the short-text sparsity problems of social networks. The problem also exists of unclear topics caused by the sparse distribution of topics. To solve the above challenge, we propose a novel word embedding topic model by combining the topic model and the continuous bag-of-words mode (Cbow) method in word embedding method, named Cbow Topic Model (CTM), for topic detection and summary in social networks. We conduct similar word clustering of the target social network text dataset by introducing the classic Cbow word vectorization method, which can effectively learn the internal relationship between words and reduce the dimensionality of the input texts. We employ the topic model-to-model short text for effectively weakening the sparsity problem of social network texts. To detect and summarize the topic, we propose a topic detection method by leveraging similarity computing for social networks. We collected a Sina microblog dataset to conduct various experiments. The experimental results demonstrate that the CTM method is superior to the existing topic model method. |
url |
https://doi.org/10.1177/0020294019865750 |
work_keys_str_mv |
AT leishi awordembeddingtopicmodelfortopicdetectionandsummaryinsocialnetworks AT gangcheng awordembeddingtopicmodelfortopicdetectionandsummaryinsocialnetworks AT shangruxie awordembeddingtopicmodelfortopicdetectionandsummaryinsocialnetworks AT gangxie awordembeddingtopicmodelfortopicdetectionandsummaryinsocialnetworks AT leishi wordembeddingtopicmodelfortopicdetectionandsummaryinsocialnetworks AT gangcheng wordembeddingtopicmodelfortopicdetectionandsummaryinsocialnetworks AT shangruxie wordembeddingtopicmodelfortopicdetectionandsummaryinsocialnetworks AT gangxie wordembeddingtopicmodelfortopicdetectionandsummaryinsocialnetworks |
_version_ |
1724494248031551488 |