Unsupervised multilingual grammar induction

We investigate the task of unsupervised constituency parsing from bilingual parallel corpora. Our goal is to use bilingual cues to learn improved parsing models for each language and to evaluate these models on held-out monolingual test data. We formulate a generative Bayesian model which seeks to e...

Full description

Bibliographic Details
Main Authors:	Snyder, Benjamin (Contributor), Naseem, Tahira (Contributor), Barzilay, Regina (Contributor)
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format:	Article
Language:	English
Published:	Association for Computational Linguistics, 2010-10-14T12:48:54Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	02436 am a22003493u 4500
001	59314
042			\|a dc
100	1	0	\|a Snyder, Benjamin \|e author
100	1	0	\|a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory \|e contributor
100	1	0	\|a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science \|e contributor
100	1	0	\|a Barzilay, Regina \|e contributor
100	1	0	\|a Snyder, Benjamin \|e contributor
100	1	0	\|a Naseem, Tahira \|e contributor
100	1	0	\|a Barzilay, Regina \|e contributor
700	1	0	\|a Naseem, Tahira \|e author
700	1	0	\|a Barzilay, Regina \|e author
245	0	0	\|a Unsupervised multilingual grammar induction
260			\|b Association for Computational Linguistics, \|c 2010-10-14T12:48:54Z.
856			\|z Get fulltext \|u http://hdl.handle.net/1721.1/59314
520			\|a We investigate the task of unsupervised constituency parsing from bilingual parallel corpora. Our goal is to use bilingual cues to learn improved parsing models for each language and to evaluate these models on held-out monolingual test data. We formulate a generative Bayesian model which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters. To this end, we adapt a formalism known as unordered tree alignment to our probabilistic setting. Using this formalism, our model loosely binds parallel trees while allowing language-specific syntactic structure. We perform inference under this model using Markov Chain Monte Carlo and dynamic programming. Applying this model to three parallel corpora (Korean-English, Urdu-English, and Chinese-English) we find substantial performance gains over the CCM model, a strong monolingual baseline. On average, across a variety of testing scenarios, our model achieves an 8.8 absolute gain in F-measure.
520			\|a National Science Foundation (U.S.) (grant IIS-0448168)
520			\|a National Science Foundation (U.S.) (grant IIS-0835445)
520			\|a National Science Foundation (U.S.) (grant IIS-0835652)
546			\|a en_US
690			\|a algorithms
690			\|a design
690			\|a experimentation
690			\|a languages
690			\|a measurement
690			\|a performance
655	7		\|a Article
773			\|t Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

Unsupervised multilingual grammar induction

Similar Items