Topic-oriented Text Mining on Open-ended Questionnaires using Latent Dirichlet Allocation

碩士 === 國立中央大學 === 資訊工程學系 === 107 === As the education system evolved over the past few years, domestic universities are committed to improving students’ learning outcomes. The most common way of evaluating learning outcomes is through questionnaires, filled in by students at the midst and the end of...

Full description

Bibliographic Details
Main Authors: Yu-Ju Chen, 陳昱儒
Other Authors: 蔡孟峰
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/jhgkaq
id ndltd-TW-107NCU05392115
record_format oai_dc
spelling ndltd-TW-107NCU053921152019-10-22T05:28:14Z http://ndltd.ncl.edu.tw/handle/jhgkaq Topic-oriented Text Mining on Open-ended Questionnaires using Latent Dirichlet Allocation 基於隱含狄利克雷分布進行開放式問卷之主題導向文字探勘 Yu-Ju Chen 陳昱儒 碩士 國立中央大學 資訊工程學系 107 As the education system evolved over the past few years, domestic universities are committed to improving students’ learning outcomes. The most common way of evaluating learning outcomes is through questionnaires, filled in by students at the midst and the end of each semester. To provide students a way to give more detailed feedbacks, these questionnaires usually contain a section for students to give comments through pure text. The comment section is designed for students to write any thoughts and opinions, there aren’t any restrictions or rules to how it should be written. These human-generated text are unstructured, and often contain writing mistakes and miss used words. With the lack of structure, it is hard for these text data to be processed as normal data using data mining techniques. Thus, we aim to analyze these text data from course evaluation questionnaires though text mining. Due to the miscellaneous content and the fact that there aren’t enough human-labeled data, it is hard to perform supervised classification methods on these text. Therefore, we use an unsupervised topic analysis technique to find the latent topic distribution of the data. Topic modeling can infer latent topic distributions and cluster similar documents without defining topic labels or train data beforehand. We perform topic modeling by implementing latent Dirichlet allocation (LDA) using Gibbs sampling, and further estimate unseen data with the LDA model. In this thesis, we imply topic analysis on the comment section of the course evaluation questionnaire. We believe that with this automatic topic modeling method, it would be more efficient for analysts to analyze text data in questionnaires. Moreover, future work on automatic questionnaire analysis can be built on this approach. 蔡孟峰 2019 學位論文 ; thesis 43 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中央大學 === 資訊工程學系 === 107 === As the education system evolved over the past few years, domestic universities are committed to improving students’ learning outcomes. The most common way of evaluating learning outcomes is through questionnaires, filled in by students at the midst and the end of each semester. To provide students a way to give more detailed feedbacks, these questionnaires usually contain a section for students to give comments through pure text. The comment section is designed for students to write any thoughts and opinions, there aren’t any restrictions or rules to how it should be written. These human-generated text are unstructured, and often contain writing mistakes and miss used words. With the lack of structure, it is hard for these text data to be processed as normal data using data mining techniques. Thus, we aim to analyze these text data from course evaluation questionnaires though text mining. Due to the miscellaneous content and the fact that there aren’t enough human-labeled data, it is hard to perform supervised classification methods on these text. Therefore, we use an unsupervised topic analysis technique to find the latent topic distribution of the data. Topic modeling can infer latent topic distributions and cluster similar documents without defining topic labels or train data beforehand. We perform topic modeling by implementing latent Dirichlet allocation (LDA) using Gibbs sampling, and further estimate unseen data with the LDA model. In this thesis, we imply topic analysis on the comment section of the course evaluation questionnaire. We believe that with this automatic topic modeling method, it would be more efficient for analysts to analyze text data in questionnaires. Moreover, future work on automatic questionnaire analysis can be built on this approach.
author2 蔡孟峰
author_facet 蔡孟峰
Yu-Ju Chen
陳昱儒
author Yu-Ju Chen
陳昱儒
spellingShingle Yu-Ju Chen
陳昱儒
Topic-oriented Text Mining on Open-ended Questionnaires using Latent Dirichlet Allocation
author_sort Yu-Ju Chen
title Topic-oriented Text Mining on Open-ended Questionnaires using Latent Dirichlet Allocation
title_short Topic-oriented Text Mining on Open-ended Questionnaires using Latent Dirichlet Allocation
title_full Topic-oriented Text Mining on Open-ended Questionnaires using Latent Dirichlet Allocation
title_fullStr Topic-oriented Text Mining on Open-ended Questionnaires using Latent Dirichlet Allocation
title_full_unstemmed Topic-oriented Text Mining on Open-ended Questionnaires using Latent Dirichlet Allocation
title_sort topic-oriented text mining on open-ended questionnaires using latent dirichlet allocation
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/jhgkaq
work_keys_str_mv AT yujuchen topicorientedtextminingonopenendedquestionnairesusinglatentdirichletallocation
AT chényùrú topicorientedtextminingonopenendedquestionnairesusinglatentdirichletallocation
AT yujuchen jīyúyǐnhándílìkèléifēnbùjìnxíngkāifàngshìwènjuǎnzhīzhǔtídǎoxiàngwénzìtànkān
AT chényùrú jīyúyǐnhándílìkèléifēnbùjìnxíngkāifàngshìwènjuǎnzhīzhǔtídǎoxiàngwénzìtànkān
_version_ 1719274233757958144