Chinese and Thai Bilingual Topic Detection Online

Bilingual topic detection is a vital application of natural language processing in the Internet plus Era and trend of economic globalization. At present, the method of bilingual topic detection can’t solve the problem of bilingual topic inconsistent distribution. Aiming at the shortcoming, this pape...

Full description

Bibliographic Details
Main Authors: Rang Ziqiang, Zhou Lanjiang, Zhang Jinpeng, Xian Yantuan, Yu Zhengtao
Format: Article
Language:English
Published: EDP Sciences 2017-01-01
Series:MATEC Web of Conferences
Subjects:
Online Access:https://doi.org/10.1051/matecconf/201710002055
id doaj-c749c6d53e0b415f846498547b53b129
record_format Article
spelling doaj-c749c6d53e0b415f846498547b53b1292021-02-02T01:31:05ZengEDP SciencesMATEC Web of Conferences2261-236X2017-01-011000205510.1051/matecconf/201710002055matecconf_gcmm2017_02055Chinese and Thai Bilingual Topic Detection OnlineRang ZiqiangZhou LanjiangZhang Jinpeng0Xian YantuanYu ZhengtaoInformation Management Center, Yunnan University Of Finance And EconomicsBilingual topic detection is a vital application of natural language processing in the Internet plus Era and trend of economic globalization. At present, the method of bilingual topic detection can’t solve the problem of bilingual topic inconsistent distribution. Aiming at the shortcoming, this paper introduces a based on maximal clique method to find bilingual topic detection of Chinese and Thai feature words. First of all, extract the information of news with keywords of each Chinese and Thai documents through the TextRank algorithm. Next, disambiguate by means of the similarity combined with Chinese and Thai dictionary. Then, use credible association rules to cluster Chinese and Thai feature words, which generates maximal clique of bilingual topic. Finally, cluster similar maximal clique of topic to obtain the collection of final topic. According to the needs of users, the method can recommend a bilingual topic of different sizes. The test of Chinese and Thai news texts in January 2016 made good achievement. From the perspective of cross-language word clustering, the algorithm effectively solves the problem of inconsistency of bilingual topic distribution reasonably, and has the advantages of no need to estimate the number of topics and low time complexity, so it is suitable for the application of online discovery in ilingual topic.https://doi.org/10.1051/matecconf/201710002055ChineseThaimaximal cliquescredible association ruleTextRankbilingual topics detection
collection DOAJ
language English
format Article
sources DOAJ
author Rang Ziqiang
Zhou Lanjiang
Zhang Jinpeng
Xian Yantuan
Yu Zhengtao
spellingShingle Rang Ziqiang
Zhou Lanjiang
Zhang Jinpeng
Xian Yantuan
Yu Zhengtao
Chinese and Thai Bilingual Topic Detection Online
MATEC Web of Conferences
Chinese
Thai
maximal cliques
credible association rule
TextRank
bilingual topics detection
author_facet Rang Ziqiang
Zhou Lanjiang
Zhang Jinpeng
Xian Yantuan
Yu Zhengtao
author_sort Rang Ziqiang
title Chinese and Thai Bilingual Topic Detection Online
title_short Chinese and Thai Bilingual Topic Detection Online
title_full Chinese and Thai Bilingual Topic Detection Online
title_fullStr Chinese and Thai Bilingual Topic Detection Online
title_full_unstemmed Chinese and Thai Bilingual Topic Detection Online
title_sort chinese and thai bilingual topic detection online
publisher EDP Sciences
series MATEC Web of Conferences
issn 2261-236X
publishDate 2017-01-01
description Bilingual topic detection is a vital application of natural language processing in the Internet plus Era and trend of economic globalization. At present, the method of bilingual topic detection can’t solve the problem of bilingual topic inconsistent distribution. Aiming at the shortcoming, this paper introduces a based on maximal clique method to find bilingual topic detection of Chinese and Thai feature words. First of all, extract the information of news with keywords of each Chinese and Thai documents through the TextRank algorithm. Next, disambiguate by means of the similarity combined with Chinese and Thai dictionary. Then, use credible association rules to cluster Chinese and Thai feature words, which generates maximal clique of bilingual topic. Finally, cluster similar maximal clique of topic to obtain the collection of final topic. According to the needs of users, the method can recommend a bilingual topic of different sizes. The test of Chinese and Thai news texts in January 2016 made good achievement. From the perspective of cross-language word clustering, the algorithm effectively solves the problem of inconsistency of bilingual topic distribution reasonably, and has the advantages of no need to estimate the number of topics and low time complexity, so it is suitable for the application of online discovery in ilingual topic.
topic Chinese
Thai
maximal cliques
credible association rule
TextRank
bilingual topics detection
url https://doi.org/10.1051/matecconf/201710002055
work_keys_str_mv AT rangziqiang chineseandthaibilingualtopicdetectiononline
AT zhoulanjiang chineseandthaibilingualtopicdetectiononline
AT zhangjinpeng chineseandthaibilingualtopicdetectiononline
AT xianyantuan chineseandthaibilingualtopicdetectiononline
AT yuzhengtao chineseandthaibilingualtopicdetectiononline
_version_ 1724311620774002688