Research on parallelization of trigram N-gram algorithm based on MapReduce

The training of large-scale corpora is an important basic work for the automatic detection of Chinese texts using the trigram N-gram algorithm. Faced with up to one million pieces of data to be processed by the new media platform per day, there is a computational bottleneck in the construction of a...

Full description

Bibliographic Details
Main Authors:	Gong Yonggang, Tian Runlin, Lian Xiaoqin, Xia Tian
Format:	Article
Language:	zho
Published:	National Computer System Engineering Research Institute of China 2019-05-01
Series:	Dianzi Jishu Yingyong
Subjects:	Chinese text ternary trigram N-gram MapReduce framework parallelization Hadoop clusters
Online Access:	http://www.chinaaet.com/article/3000101571

Internet

http://www.chinaaet.com/article/3000101571

Research on parallelization of trigram N-gram algorithm based on MapReduce

Internet

Similar Items