Sentiment Analysis for Patient-Author Text: Using Word2Vec and Symptoms

碩士 === 中原大學 === 資訊工程研究所 === 105 === Recently, Sentiment analysis (SA) is gaining popularity. Most previous work studied product reviews with machine learning techniques to predict the sentiment polarity. They focused on how to build the patterns like statistical language models or to extract semanti...

Full description

Bibliographic Details
Main Authors:	Zong-Yao Wu, 吳宗耀
Other Authors:	Shih-Wen Ke
Format:	Others
Language:	zh-TW
Published:	2017
Online Access:	http://ndltd.ncl.edu.tw/handle/q78nt2

id	ndltd-TW-105CYCU5392031
record_format	oai_dc
spelling	ndltd-TW-105CYCU53920312019-05-15T23:39:16Z http://ndltd.ncl.edu.tw/handle/q78nt2 Sentiment Analysis for Patient-Author Text: Using Word2Vec and Symptoms 文字情感分析:利用病徵分析病患自撰之日誌 Zong-Yao Wu 吳宗耀碩士中原大學資訊工程研究所 105 Recently, Sentiment analysis (SA) is gaining popularity. Most previous work studied product reviews with machine learning techniques to predict the sentiment polarity. They focused on how to build the patterns like statistical language models or to extract semantic features from texts. In this paper, we apply SA techniques to patient-authored text on online medical communities. Our datasets are patient-authored text (PAT) from a well-known medical website, patientslikeme.com (PLM). Patients can share mood phrases, severity of symptoms, treatment, and quality of life on PLM. PAT is more like a diary or journal reflecting on the patients themselves. There is another special point unique to the PLM datasets that is discussion of symptoms and diseases. So we will discuss the relationship of sentiment polarity and symptoms. Many studies used bag-of-word to represent document features but some studies showed that bag-of-word will lose the word a part of meaning. In our study, we attempted to explore the possibility of using “word vectors” to represent documents. Word2Vec is a tool which most want to express the concept is training the vector not only finding similar words, but also having multiple levels of meaning. In the first experiment, we used Word2Vec to generate word vectors and we used five different methods to generate sentence vector including the most-commonly used average method, no normalization method, the stop word method, and the sentiment method in the SA domain. Then we used two classifiers support vector machine (SVM) and k-nearest neighbors (k-NN) with Cosine Similarity to classify the sentiment polarity of the PATs. Some previous studies claimed that the corpus for training the Word2Vec model is very important, so we also wished to discuss the effect of corpus composition on the classification results. We prepared two corpora for second experiment which will discuss whether high quality or volume is more helpful for classification. We have observed that “PATs with reference to symptoms” have a large effect on classification from past studies. Our observation shows that negative polarity and reference to symptoms are highly correlated. Therefore we are going to use build another training model and evaluate the results based on this observation. The results show that the non-normalization method is the best in identifying positive polarity, the sentiment method is the best in identifying negative polarity. We also found that the normalization method produced worse classification results than the non-normalization method. In the second experiment, we used two different types of classifiers, i.e. SVM and k-NN. All results showed that the Word2Vec model trained on medical corpora yielded better classification performance than the Wikipedia corpus. This outcome indicated that the quality in the training corpus was more important than the volume when training Word2Vec models. In the future, we wish to further explore the usage of explicit and implicit references to symptoms in the PATs. Shih-Wen Ke 柯士文 2017 學位論文 ; thesis 82 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 中原大學 === 資訊工程研究所 === 105 === Recently, Sentiment analysis (SA) is gaining popularity. Most previous work studied product reviews with machine learning techniques to predict the sentiment polarity. They focused on how to build the patterns like statistical language models or to extract semantic features from texts. In this paper, we apply SA techniques to patient-authored text on online medical communities. Our datasets are patient-authored text (PAT) from a well-known medical website, patientslikeme.com (PLM). Patients can share mood phrases, severity of symptoms, treatment, and quality of life on PLM. PAT is more like a diary or journal reflecting on the patients themselves. There is another special point unique to the PLM datasets that is discussion of symptoms and diseases. So we will discuss the relationship of sentiment polarity and symptoms. Many studies used bag-of-word to represent document features but some studies showed that bag-of-word will lose the word a part of meaning. In our study, we attempted to explore the possibility of using “word vectors” to represent documents. Word2Vec is a tool which most want to express the concept is training the vector not only finding similar words, but also having multiple levels of meaning. In the first experiment, we used Word2Vec to generate word vectors and we used five different methods to generate sentence vector including the most-commonly used average method, no normalization method, the stop word method, and the sentiment method in the SA domain. Then we used two classifiers support vector machine (SVM) and k-nearest neighbors (k-NN) with Cosine Similarity to classify the sentiment polarity of the PATs. Some previous studies claimed that the corpus for training the Word2Vec model is very important, so we also wished to discuss the effect of corpus composition on the classification results. We prepared two corpora for second experiment which will discuss whether high quality or volume is more helpful for classification. We have observed that “PATs with reference to symptoms” have a large effect on classification from past studies. Our observation shows that negative polarity and reference to symptoms are highly correlated. Therefore we are going to use build another training model and evaluate the results based on this observation. The results show that the non-normalization method is the best in identifying positive polarity, the sentiment method is the best in identifying negative polarity. We also found that the normalization method produced worse classification results than the non-normalization method. In the second experiment, we used two different types of classifiers, i.e. SVM and k-NN. All results showed that the Word2Vec model trained on medical corpora yielded better classification performance than the Wikipedia corpus. This outcome indicated that the quality in the training corpus was more important than the volume when training Word2Vec models. In the future, we wish to further explore the usage of explicit and implicit references to symptoms in the PATs.
author2	Shih-Wen Ke
author_facet	Shih-Wen Ke Zong-Yao Wu 吳宗耀
author	Zong-Yao Wu 吳宗耀
spellingShingle	Zong-Yao Wu 吳宗耀 Sentiment Analysis for Patient-Author Text: Using Word2Vec and Symptoms
author_sort	Zong-Yao Wu
title	Sentiment Analysis for Patient-Author Text: Using Word2Vec and Symptoms
title_short	Sentiment Analysis for Patient-Author Text: Using Word2Vec and Symptoms
title_full	Sentiment Analysis for Patient-Author Text: Using Word2Vec and Symptoms
title_fullStr	Sentiment Analysis for Patient-Author Text: Using Word2Vec and Symptoms
title_full_unstemmed	Sentiment Analysis for Patient-Author Text: Using Word2Vec and Symptoms
title_sort	sentiment analysis for patient-author text: using word2vec and symptoms
publishDate	2017
url	http://ndltd.ncl.edu.tw/handle/q78nt2
work_keys_str_mv	AT zongyaowu sentimentanalysisforpatientauthortextusingword2vecandsymptoms AT wúzōngyào sentimentanalysisforpatientauthortextusingword2vecandsymptoms AT zongyaowu wénzìqínggǎnfēnxīlìyòngbìngzhēngfēnxībìnghuànzìzhuànzhīrìzhì AT wúzōngyào wénzìqínggǎnfēnxīlìyòngbìngzhēngfēnxībìnghuànzìzhuànzhīrìzhì
_version_	1719150020312170496

Sentiment Analysis for Patient-Author Text: Using Word2Vec and Symptoms

Similar Items