Translationese and Swedish-English Statistical Machine Translation

This thesis investigates how well machine learned classifiers can identify translated text, and the effect translationese may have in Statistical Machine Translation -- all in a Swedish-to-English, and reverse, context. Translationese is a term used to describe the dialect of a target language that...

Full description

Bibliographic Details
Main Author: Joelsson, Jakob
Format: Others
Language:English
Published: Uppsala universitet, Institutionen för lingvistik och filologi 2016
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-305199
id ndltd-UPSALLA1-oai-DiVA.org-uu-305199
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-uu-3051992018-01-15T07:13:08ZTranslationese and Swedish-English Statistical Machine TranslationengJoelsson, JakobUppsala universitet, Institutionen för lingvistik och filologi2016TranslationeseStatistical Machine TranslationText ClassificationClassification of TranslationeseLanguage Technology (Computational Linguistics)Språkteknologi (språkvetenskaplig databehandling)This thesis investigates how well machine learned classifiers can identify translated text, and the effect translationese may have in Statistical Machine Translation -- all in a Swedish-to-English, and reverse, context. Translationese is a term used to describe the dialect of a target language that is produced when a source text is translated. The systems trained for this thesis are SVM-based classifiers for identifying translationese, as well as translation and language models for Statistical Machine Translation. The classifiers successfully identified translationese in relation to non-translated text, and to some extent, also what source language the texts were translated from. In the SMT experiments, variation of the translation model was whataffected the results the most in the BLEU evaluation. Systems configured with non-translated source text and translationese target text performed better than their reversed counter parts. The language model experiments showed that those trained on known translationese and classified translationese performed better than known non-translated text, though classified translationese did not perform as well as the known translationese. Ultimately, the thesis shows that translationese can be identified by machine learned classifiers and may affect the results of SMT systems. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-305199application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Translationese
Statistical Machine Translation
Text Classification
Classification of Translationese
Language Technology (Computational Linguistics)
Språkteknologi (språkvetenskaplig databehandling)
spellingShingle Translationese
Statistical Machine Translation
Text Classification
Classification of Translationese
Language Technology (Computational Linguistics)
Språkteknologi (språkvetenskaplig databehandling)
Joelsson, Jakob
Translationese and Swedish-English Statistical Machine Translation
description This thesis investigates how well machine learned classifiers can identify translated text, and the effect translationese may have in Statistical Machine Translation -- all in a Swedish-to-English, and reverse, context. Translationese is a term used to describe the dialect of a target language that is produced when a source text is translated. The systems trained for this thesis are SVM-based classifiers for identifying translationese, as well as translation and language models for Statistical Machine Translation. The classifiers successfully identified translationese in relation to non-translated text, and to some extent, also what source language the texts were translated from. In the SMT experiments, variation of the translation model was whataffected the results the most in the BLEU evaluation. Systems configured with non-translated source text and translationese target text performed better than their reversed counter parts. The language model experiments showed that those trained on known translationese and classified translationese performed better than known non-translated text, though classified translationese did not perform as well as the known translationese. Ultimately, the thesis shows that translationese can be identified by machine learned classifiers and may affect the results of SMT systems.
author Joelsson, Jakob
author_facet Joelsson, Jakob
author_sort Joelsson, Jakob
title Translationese and Swedish-English Statistical Machine Translation
title_short Translationese and Swedish-English Statistical Machine Translation
title_full Translationese and Swedish-English Statistical Machine Translation
title_fullStr Translationese and Swedish-English Statistical Machine Translation
title_full_unstemmed Translationese and Swedish-English Statistical Machine Translation
title_sort translationese and swedish-english statistical machine translation
publisher Uppsala universitet, Institutionen för lingvistik och filologi
publishDate 2016
url http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-305199
work_keys_str_mv AT joelssonjakob translationeseandswedishenglishstatisticalmachinetranslation
_version_ 1718611038188863488