Chinese and Thai Bilingual Topic Detection Online
Bilingual topic detection is a vital application of natural language processing in the Internet plus Era and trend of economic globalization. At present, the method of bilingual topic detection can’t solve the problem of bilingual topic inconsistent distribution. Aiming at the shortcoming, this pape...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
EDP Sciences
2017-01-01
|
Series: | MATEC Web of Conferences |
Subjects: | |
Online Access: | https://doi.org/10.1051/matecconf/201710002055 |
id |
doaj-c749c6d53e0b415f846498547b53b129 |
---|---|
record_format |
Article |
spelling |
doaj-c749c6d53e0b415f846498547b53b1292021-02-02T01:31:05ZengEDP SciencesMATEC Web of Conferences2261-236X2017-01-011000205510.1051/matecconf/201710002055matecconf_gcmm2017_02055Chinese and Thai Bilingual Topic Detection OnlineRang ZiqiangZhou LanjiangZhang Jinpeng0Xian YantuanYu ZhengtaoInformation Management Center, Yunnan University Of Finance And EconomicsBilingual topic detection is a vital application of natural language processing in the Internet plus Era and trend of economic globalization. At present, the method of bilingual topic detection can’t solve the problem of bilingual topic inconsistent distribution. Aiming at the shortcoming, this paper introduces a based on maximal clique method to find bilingual topic detection of Chinese and Thai feature words. First of all, extract the information of news with keywords of each Chinese and Thai documents through the TextRank algorithm. Next, disambiguate by means of the similarity combined with Chinese and Thai dictionary. Then, use credible association rules to cluster Chinese and Thai feature words, which generates maximal clique of bilingual topic. Finally, cluster similar maximal clique of topic to obtain the collection of final topic. According to the needs of users, the method can recommend a bilingual topic of different sizes. The test of Chinese and Thai news texts in January 2016 made good achievement. From the perspective of cross-language word clustering, the algorithm effectively solves the problem of inconsistency of bilingual topic distribution reasonably, and has the advantages of no need to estimate the number of topics and low time complexity, so it is suitable for the application of online discovery in ilingual topic.https://doi.org/10.1051/matecconf/201710002055ChineseThaimaximal cliquescredible association ruleTextRankbilingual topics detection |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Rang Ziqiang Zhou Lanjiang Zhang Jinpeng Xian Yantuan Yu Zhengtao |
spellingShingle |
Rang Ziqiang Zhou Lanjiang Zhang Jinpeng Xian Yantuan Yu Zhengtao Chinese and Thai Bilingual Topic Detection Online MATEC Web of Conferences Chinese Thai maximal cliques credible association rule TextRank bilingual topics detection |
author_facet |
Rang Ziqiang Zhou Lanjiang Zhang Jinpeng Xian Yantuan Yu Zhengtao |
author_sort |
Rang Ziqiang |
title |
Chinese and Thai Bilingual Topic Detection Online |
title_short |
Chinese and Thai Bilingual Topic Detection Online |
title_full |
Chinese and Thai Bilingual Topic Detection Online |
title_fullStr |
Chinese and Thai Bilingual Topic Detection Online |
title_full_unstemmed |
Chinese and Thai Bilingual Topic Detection Online |
title_sort |
chinese and thai bilingual topic detection online |
publisher |
EDP Sciences |
series |
MATEC Web of Conferences |
issn |
2261-236X |
publishDate |
2017-01-01 |
description |
Bilingual topic detection is a vital application of natural language processing in the Internet plus Era and trend of economic globalization. At present, the method of bilingual topic detection can’t solve the problem of bilingual topic inconsistent distribution. Aiming at the shortcoming, this paper introduces a based on maximal clique method to find bilingual topic detection of Chinese and Thai feature words. First of all, extract the information of news with keywords of each Chinese and Thai documents through the TextRank algorithm. Next, disambiguate by means of the similarity combined with Chinese and Thai dictionary. Then, use credible association rules to cluster Chinese and Thai feature words, which generates maximal clique of bilingual topic. Finally, cluster similar maximal clique of topic to obtain the collection of final topic. According to the needs of users, the method can recommend a bilingual topic of different sizes. The test of Chinese and Thai news texts in January 2016 made good achievement. From the perspective of cross-language word clustering, the algorithm effectively solves the problem of inconsistency of bilingual topic distribution reasonably, and has the advantages of no need to estimate the number of topics and low time complexity, so it is suitable for the application of online discovery in ilingual topic. |
topic |
Chinese Thai maximal cliques credible association rule TextRank bilingual topics detection |
url |
https://doi.org/10.1051/matecconf/201710002055 |
work_keys_str_mv |
AT rangziqiang chineseandthaibilingualtopicdetectiononline AT zhoulanjiang chineseandthaibilingualtopicdetectiononline AT zhangjinpeng chineseandthaibilingualtopicdetectiononline AT xianyantuan chineseandthaibilingualtopicdetectiononline AT yuzhengtao chineseandthaibilingualtopicdetectiononline |
_version_ |
1724311620774002688 |