Misalignment Detection for Web-Scraped Corpora: A Supervised Regression Approach
To build state-of-the-art Neural Machine Translation (NMT) systems, high-quality parallel sentences are needed. Typically, large amounts of data are scraped from multilingual web sites and aligned into datasets for training. Many tools exist for automatic alignment of such datasets. However, the qua...
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-09-01
|
Series: | Informatics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-9709/6/3/35 |