Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: A Bayesian Non-Parametric Approach
We investigate the problem of unsupervised part-of-speech tagging when raw parallel data is available in a large number of languages. Patterns of ambiguity vary greatly across languages and therefore even unannotated multilingual data can serve as a learning signal. We propose a non-parametric Bayes...
Main Authors: | , , , |
---|---|
Other Authors: | , |
Format: | Article |
Language: | English |
Published: |
Association for Computational Linguistics,
2010-10-07T13:12:43Z.
|
Subjects: | |
Online Access: | Get fulltext |
Summary: | We investigate the problem of unsupervised part-of-speech tagging when raw parallel data is available in a large number of languages. Patterns of ambiguity vary greatly across languages and therefore even unannotated multilingual data can serve as a learning signal. We propose a non-parametric Bayesian model that connects related tagging decisions across languages through the use of multilingual latent variables. Our experiments show that performance improves steadily as the number of languages increases. National Science Foundation (U.S.) (CAREER grant IIS-0448168) National Science Foundation (U.S.) (CAREER grant IIS- 0835445) |
---|