Research on parallelization of trigram N-gram algorithm based on MapReduce

The training of large-scale corpora is an important basic work for the automatic detection of Chinese texts using the trigram N-gram algorithm. Faced with up to one million pieces of data to be processed by the new media platform per day, there is a computational bottleneck in the construction of a...

Full description

Bibliographic Details
Main Authors: Gong Yonggang, Tian Runlin, Lian Xiaoqin, Xia Tian
Format: Article
Language:zho
Published: National Computer System Engineering Research Institute of China 2019-05-01
Series:Dianzi Jishu Yingyong
Subjects:
Online Access:http://www.chinaaet.com/article/3000101571