Statistical models for unsupervised learning of morphology and POS tagging

This thesis concentrates on two fields in natural language processing. The main contribution of the thesis is in the field of morphology learning. Morphology is the study of how words are formed combining different language constituents (called morphemes) and morphology learning is the process of an...

Full description

Bibliographic Details
Main Author: Can, Burcu
Other Authors: Manandhar, Suresh
Published: University of York 2011
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.556255
id ndltd-bl.uk-oai-ethos.bl.uk-556255
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5562552017-10-04T03:18:49ZStatistical models for unsupervised learning of morphology and POS taggingCan, BurcuManandhar, Suresh2011This thesis concentrates on two fields in natural language processing. The main contribution of the thesis is in the field of morphology learning. Morphology is the study of how words are formed combining different language constituents (called morphemes) and morphology learning is the process of analysing words, by splitting into these constituents. In the scope of this thesis, morphology is learned mainly by paradigmatic approaches, in which words are analysed in groups, called paradigms. Paradigms are morphological structures having the capability of generating various word forms. We propose approaches for capturing paradigms to perform morphological segmentation. One of the approaches proposed captures paradigms within a hierarchical tree structure. Using a hierarchical structure covers a wide range of paradigms by spotting morphological similarities. The second scope of the thesis is part-of-speech (POS) tagging. Parts-of-speech are linguistic categories, which group words having similar syntactic features, i.e. noun, adjective, verb etc. In the thesis, we investigate how to exploit POS tags to learn morphology. We propose a model to capture paradigms through syntactic categories. When syntactic categories are provided, the proposed system can capture paradigms well. Following this approach, we extend it for the case of having no syntactic categories provided. To this end, we propose a joint model, in which POS tags and morphology are learned simultaneously. Our results show that a joint model is possible for learning morphology and POS tagging. We also study morpheme labelling, for which we propose a clustering algorithm that groups morphemes showing similar features. The algorithm can capture morphemes having similar meanings.006.35University of Yorkhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.556255http://etheses.whiterose.ac.uk/2364/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 006.35
spellingShingle 006.35
Can, Burcu
Statistical models for unsupervised learning of morphology and POS tagging
description This thesis concentrates on two fields in natural language processing. The main contribution of the thesis is in the field of morphology learning. Morphology is the study of how words are formed combining different language constituents (called morphemes) and morphology learning is the process of analysing words, by splitting into these constituents. In the scope of this thesis, morphology is learned mainly by paradigmatic approaches, in which words are analysed in groups, called paradigms. Paradigms are morphological structures having the capability of generating various word forms. We propose approaches for capturing paradigms to perform morphological segmentation. One of the approaches proposed captures paradigms within a hierarchical tree structure. Using a hierarchical structure covers a wide range of paradigms by spotting morphological similarities. The second scope of the thesis is part-of-speech (POS) tagging. Parts-of-speech are linguistic categories, which group words having similar syntactic features, i.e. noun, adjective, verb etc. In the thesis, we investigate how to exploit POS tags to learn morphology. We propose a model to capture paradigms through syntactic categories. When syntactic categories are provided, the proposed system can capture paradigms well. Following this approach, we extend it for the case of having no syntactic categories provided. To this end, we propose a joint model, in which POS tags and morphology are learned simultaneously. Our results show that a joint model is possible for learning morphology and POS tagging. We also study morpheme labelling, for which we propose a clustering algorithm that groups morphemes showing similar features. The algorithm can capture morphemes having similar meanings.
author2 Manandhar, Suresh
author_facet Manandhar, Suresh
Can, Burcu
author Can, Burcu
author_sort Can, Burcu
title Statistical models for unsupervised learning of morphology and POS tagging
title_short Statistical models for unsupervised learning of morphology and POS tagging
title_full Statistical models for unsupervised learning of morphology and POS tagging
title_fullStr Statistical models for unsupervised learning of morphology and POS tagging
title_full_unstemmed Statistical models for unsupervised learning of morphology and POS tagging
title_sort statistical models for unsupervised learning of morphology and pos tagging
publisher University of York
publishDate 2011
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.556255
work_keys_str_mv AT canburcu statisticalmodelsforunsupervisedlearningofmorphologyandpostagging
_version_ 1718543057337450496