Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts

Word embedding has benefited a broad spectrum of text analysis tasks by learning distributed word representations to encode word semantics. Word representations are typically learned by modeling local contexts of words, assuming that words sharing similar surrounding words are semantically close. We...

Full description

Bibliographic Details
Main Authors: Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Jiawei Han
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-03-01
Series:Frontiers in Big Data
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fdata.2020.00009/full
id doaj-9af1d28d696c4360b6c2547313c0da01
record_format Article
spelling doaj-9af1d28d696c4360b6c2547313c0da012020-11-25T03:02:50ZengFrontiers Media S.A.Frontiers in Big Data2624-909X2020-03-01310.3389/fdata.2020.00009517899Unsupervised Word Embedding Learning by Incorporating Local and Global ContextsYu Meng0Jiaxin Huang1Guangyuan Wang2Zihan Wang3Chao Zhang4Jiawei Han5Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, United StatesDepartment of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, United StatesDepartment of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, United StatesDepartment of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, United StatesSchool of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta, GA, United StatesDepartment of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, United StatesWord embedding has benefited a broad spectrum of text analysis tasks by learning distributed word representations to encode word semantics. Word representations are typically learned by modeling local contexts of words, assuming that words sharing similar surrounding words are semantically close. We argue that local contexts can only partially define word semantics in the unsupervised word embedding learning. Global contexts, referring to the broader semantic units, such as the document or paragraph where the word appears, can capture different aspects of word semantics and complement local contexts. We propose two simple yet effective unsupervised word embedding models that jointly model both local and global contexts to learn word representations. We provide theoretical interpretations of the proposed models to demonstrate how local and global contexts are jointly modeled, assuming a generative relationship between words and contexts. We conduct a thorough evaluation on a wide range of benchmark datasets. Our quantitative analysis and case study show that despite their simplicity, our two proposed models achieve superior performance on word similarity and text classification tasks.https://www.frontiersin.org/article/10.3389/fdata.2020.00009/fullword embeddingunsupervised learningword semanticslocal contextsglobal contexts
collection DOAJ
language English
format Article
sources DOAJ
author Yu Meng
Jiaxin Huang
Guangyuan Wang
Zihan Wang
Chao Zhang
Jiawei Han
spellingShingle Yu Meng
Jiaxin Huang
Guangyuan Wang
Zihan Wang
Chao Zhang
Jiawei Han
Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts
Frontiers in Big Data
word embedding
unsupervised learning
word semantics
local contexts
global contexts
author_facet Yu Meng
Jiaxin Huang
Guangyuan Wang
Zihan Wang
Chao Zhang
Jiawei Han
author_sort Yu Meng
title Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts
title_short Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts
title_full Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts
title_fullStr Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts
title_full_unstemmed Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts
title_sort unsupervised word embedding learning by incorporating local and global contexts
publisher Frontiers Media S.A.
series Frontiers in Big Data
issn 2624-909X
publishDate 2020-03-01
description Word embedding has benefited a broad spectrum of text analysis tasks by learning distributed word representations to encode word semantics. Word representations are typically learned by modeling local contexts of words, assuming that words sharing similar surrounding words are semantically close. We argue that local contexts can only partially define word semantics in the unsupervised word embedding learning. Global contexts, referring to the broader semantic units, such as the document or paragraph where the word appears, can capture different aspects of word semantics and complement local contexts. We propose two simple yet effective unsupervised word embedding models that jointly model both local and global contexts to learn word representations. We provide theoretical interpretations of the proposed models to demonstrate how local and global contexts are jointly modeled, assuming a generative relationship between words and contexts. We conduct a thorough evaluation on a wide range of benchmark datasets. Our quantitative analysis and case study show that despite their simplicity, our two proposed models achieve superior performance on word similarity and text classification tasks.
topic word embedding
unsupervised learning
word semantics
local contexts
global contexts
url https://www.frontiersin.org/article/10.3389/fdata.2020.00009/full
work_keys_str_mv AT yumeng unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts
AT jiaxinhuang unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts
AT guangyuanwang unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts
AT zihanwang unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts
AT chaozhang unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts
AT jiaweihan unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts
_version_ 1724688116069957632