Research on parallelization of trigram N-gram algorithm based on MapReduce
The training of large-scale corpora is an important basic work for the automatic detection of Chinese texts using the trigram N-gram algorithm. Faced with up to one million pieces of data to be processed by the new media platform per day, there is a computational bottleneck in the construction of a...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
National Computer System Engineering Research Institute of China
2019-05-01
|
Series: | Dianzi Jishu Yingyong |
Subjects: | |
Online Access: | http://www.chinaaet.com/article/3000101571 |