Chinese Microblog Topic Detection through POS-Based Semantic Expansion

A microblog is a new type of social media for information publishing, acquiring, and spreading. Finding the significant topics of a microblog is necessary for popularity tracing and public opinion following. This paper puts forward a method to detect topics from Chinese microblogs. Since traditional...

Full description

Bibliographic Details
Main Authors: Lianhong Ding, Bin Sun, Peng Shi
Format: Article
Language:English
Published: MDPI AG 2018-08-01
Series:Information
Subjects:
Online Access:http://www.mdpi.com/2078-2489/9/8/203
id doaj-fc974d2f1df242548c4c56cf133e00a3
record_format Article
spelling doaj-fc974d2f1df242548c4c56cf133e00a32020-11-25T00:11:35ZengMDPI AGInformation2078-24892018-08-019820310.3390/info9080203info9080203Chinese Microblog Topic Detection through POS-Based Semantic ExpansionLianhong Ding0Bin Sun1Peng Shi2School of Information, Beijing Wuzi University, Beijing 101149, ChinaSchool of Information, Beijing Wuzi University, Beijing 101149, ChinaNational Center for Materials Service Safety, University of Science and Technology Beijing, Beijing 100083, ChinaA microblog is a new type of social media for information publishing, acquiring, and spreading. Finding the significant topics of a microblog is necessary for popularity tracing and public opinion following. This paper puts forward a method to detect topics from Chinese microblogs. Since traditional methods showed low performance on a short text from a microblog, we put forward a topic detection method based on the semantic description of the microblog post. The semantic expansion of the post supplies more information and clues for topic detection. First, semantic features are extracted from a microblog post. Second, the semantic features are expanded according to a thesaurus. Here TongYiCi CiLin is used as the lexical resource to find words with the same meaning. To overcome the polysemy problem, several semantic expansion strategies based on part-of-speech are introduced and compared. Third, an approach to detect topics based on semantic descriptions and an improved incremental clustering algorithm is introduced. A dataset from Sina Weibo is employed to evaluate our method. Experimental results show that our method can bring about better results both for post clustering and topic detection in Chinese microblogs. We also found that the semantic expansion of nouns is far more efficient than for other parts of speech. The potential mechanism of the phenomenon is also analyzed and discussed.http://www.mdpi.com/2078-2489/9/8/203Chinese microblogssemantic expansionshort texttopic detection
collection DOAJ
language English
format Article
sources DOAJ
author Lianhong Ding
Bin Sun
Peng Shi
spellingShingle Lianhong Ding
Bin Sun
Peng Shi
Chinese Microblog Topic Detection through POS-Based Semantic Expansion
Information
Chinese microblogs
semantic expansion
short text
topic detection
author_facet Lianhong Ding
Bin Sun
Peng Shi
author_sort Lianhong Ding
title Chinese Microblog Topic Detection through POS-Based Semantic Expansion
title_short Chinese Microblog Topic Detection through POS-Based Semantic Expansion
title_full Chinese Microblog Topic Detection through POS-Based Semantic Expansion
title_fullStr Chinese Microblog Topic Detection through POS-Based Semantic Expansion
title_full_unstemmed Chinese Microblog Topic Detection through POS-Based Semantic Expansion
title_sort chinese microblog topic detection through pos-based semantic expansion
publisher MDPI AG
series Information
issn 2078-2489
publishDate 2018-08-01
description A microblog is a new type of social media for information publishing, acquiring, and spreading. Finding the significant topics of a microblog is necessary for popularity tracing and public opinion following. This paper puts forward a method to detect topics from Chinese microblogs. Since traditional methods showed low performance on a short text from a microblog, we put forward a topic detection method based on the semantic description of the microblog post. The semantic expansion of the post supplies more information and clues for topic detection. First, semantic features are extracted from a microblog post. Second, the semantic features are expanded according to a thesaurus. Here TongYiCi CiLin is used as the lexical resource to find words with the same meaning. To overcome the polysemy problem, several semantic expansion strategies based on part-of-speech are introduced and compared. Third, an approach to detect topics based on semantic descriptions and an improved incremental clustering algorithm is introduced. A dataset from Sina Weibo is employed to evaluate our method. Experimental results show that our method can bring about better results both for post clustering and topic detection in Chinese microblogs. We also found that the semantic expansion of nouns is far more efficient than for other parts of speech. The potential mechanism of the phenomenon is also analyzed and discussed.
topic Chinese microblogs
semantic expansion
short text
topic detection
url http://www.mdpi.com/2078-2489/9/8/203
work_keys_str_mv AT lianhongding chinesemicroblogtopicdetectionthroughposbasedsemanticexpansion
AT binsun chinesemicroblogtopicdetectionthroughposbasedsemanticexpansion
AT pengshi chinesemicroblogtopicdetectionthroughposbasedsemanticexpansion
_version_ 1725403337390555136