Idiomaticity Prediction of Chinese Noun Compounds and Its Applications

Idiomaticity refers to the situation where the meaning of a lexical unit cannot be derived from the usual meanings of its constituents. As a ubiquitous phenomenon in languages, the existence of idioms often causes significant challenges for semantic NLP tasks. While previous research mostly focuses...

Full description

Bibliographic Details
Main Authors: Chengyu Wang, Yan Fan, Xiaofeng He, Hongyuan Zha, Aoying Zhou
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8853294/
id doaj-6c032dfa2f714c18acd9f92e745a7112
record_format Article
spelling doaj-6c032dfa2f714c18acd9f92e745a71122021-03-29T23:54:36ZengIEEEIEEE Access2169-35362019-01-01714286614287810.1109/ACCESS.2019.29445728853294Idiomaticity Prediction of Chinese Noun Compounds and Its ApplicationsChengyu Wang0https://orcid.org/0000-0003-1010-9678Yan Fan1Xiaofeng He2Hongyuan Zha3Aoying Zhou4School of Software Engineering, East China Normal University, Shanghai, ChinaAlibaba Group, Hangzhou, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai, ChinaSchool of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USASchool of Data Science and Engineering, East China Normal University, Shanghai, ChinaIdiomaticity refers to the situation where the meaning of a lexical unit cannot be derived from the usual meanings of its constituents. As a ubiquitous phenomenon in languages, the existence of idioms often causes significant challenges for semantic NLP tasks. While previous research mostly focuses on the idiomatic usage detection of English verb-noun combinations and the semantic analysis of Noun Compounds (NCs), the idiomaticity issues of Chinese NCs have been rarely studied. In this work, we aim at classifying Chinese NCs into four idiomaticity degrees. Each idiomaticity degree refers to a specific paradigm of how the NCs should be interpreted. To address this task, a Relational and Compositional Representation Learning model (RCRL) is proposed, which considers the relational textual patterns and the compositionality levels of Chinese NCs. RCRL learns relational representations of NCs to capture the semantic relations between two nouns within an NC, expressed by textual patterns and their statistical signals in the corpus. It further employs compositional representations to model the compositionality levels of NCs via network embeddings. Both loss functions of idiomaticity degree classification and representation learning are jointly optimized in an integrated neural network. Experiments over two datasets illustrate the effectiveness of RCRL, outperforming state-of-the-art approaches. Three applicational studies are further conducted to show the usefulness of RCRL and the roles of idiomaticity prediction of Chinese NCs in the fields of NLP.https://ieeexplore.ieee.org/document/8853294/Representation learningidiomaticity predictionnoun compoundrelational patterncompositionality analysis
collection DOAJ
language English
format Article
sources DOAJ
author Chengyu Wang
Yan Fan
Xiaofeng He
Hongyuan Zha
Aoying Zhou
spellingShingle Chengyu Wang
Yan Fan
Xiaofeng He
Hongyuan Zha
Aoying Zhou
Idiomaticity Prediction of Chinese Noun Compounds and Its Applications
IEEE Access
Representation learning
idiomaticity prediction
noun compound
relational pattern
compositionality analysis
author_facet Chengyu Wang
Yan Fan
Xiaofeng He
Hongyuan Zha
Aoying Zhou
author_sort Chengyu Wang
title Idiomaticity Prediction of Chinese Noun Compounds and Its Applications
title_short Idiomaticity Prediction of Chinese Noun Compounds and Its Applications
title_full Idiomaticity Prediction of Chinese Noun Compounds and Its Applications
title_fullStr Idiomaticity Prediction of Chinese Noun Compounds and Its Applications
title_full_unstemmed Idiomaticity Prediction of Chinese Noun Compounds and Its Applications
title_sort idiomaticity prediction of chinese noun compounds and its applications
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Idiomaticity refers to the situation where the meaning of a lexical unit cannot be derived from the usual meanings of its constituents. As a ubiquitous phenomenon in languages, the existence of idioms often causes significant challenges for semantic NLP tasks. While previous research mostly focuses on the idiomatic usage detection of English verb-noun combinations and the semantic analysis of Noun Compounds (NCs), the idiomaticity issues of Chinese NCs have been rarely studied. In this work, we aim at classifying Chinese NCs into four idiomaticity degrees. Each idiomaticity degree refers to a specific paradigm of how the NCs should be interpreted. To address this task, a Relational and Compositional Representation Learning model (RCRL) is proposed, which considers the relational textual patterns and the compositionality levels of Chinese NCs. RCRL learns relational representations of NCs to capture the semantic relations between two nouns within an NC, expressed by textual patterns and their statistical signals in the corpus. It further employs compositional representations to model the compositionality levels of NCs via network embeddings. Both loss functions of idiomaticity degree classification and representation learning are jointly optimized in an integrated neural network. Experiments over two datasets illustrate the effectiveness of RCRL, outperforming state-of-the-art approaches. Three applicational studies are further conducted to show the usefulness of RCRL and the roles of idiomaticity prediction of Chinese NCs in the fields of NLP.
topic Representation learning
idiomaticity prediction
noun compound
relational pattern
compositionality analysis
url https://ieeexplore.ieee.org/document/8853294/
work_keys_str_mv AT chengyuwang idiomaticitypredictionofchinesenouncompoundsanditsapplications
AT yanfan idiomaticitypredictionofchinesenouncompoundsanditsapplications
AT xiaofenghe idiomaticitypredictionofchinesenouncompoundsanditsapplications
AT hongyuanzha idiomaticitypredictionofchinesenouncompoundsanditsapplications
AT aoyingzhou idiomaticitypredictionofchinesenouncompoundsanditsapplications
_version_ 1724188879977709568