Topic-oriented Text Mining on Open-ended Questionnaires using Latent Dirichlet Allocation

碩士 === 國立中央大學 === 資訊工程學系 === 107 === As the education system evolved over the past few years, domestic universities are committed to improving students’ learning outcomes. The most common way of evaluating learning outcomes is through questionnaires, filled in by students at the midst and the end of...

Full description

Bibliographic Details
Main Authors: Yu-Ju Chen, 陳昱儒
Other Authors: 蔡孟峰
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/jhgkaq
Description
Summary:碩士 === 國立中央大學 === 資訊工程學系 === 107 === As the education system evolved over the past few years, domestic universities are committed to improving students’ learning outcomes. The most common way of evaluating learning outcomes is through questionnaires, filled in by students at the midst and the end of each semester. To provide students a way to give more detailed feedbacks, these questionnaires usually contain a section for students to give comments through pure text. The comment section is designed for students to write any thoughts and opinions, there aren’t any restrictions or rules to how it should be written. These human-generated text are unstructured, and often contain writing mistakes and miss used words. With the lack of structure, it is hard for these text data to be processed as normal data using data mining techniques. Thus, we aim to analyze these text data from course evaluation questionnaires though text mining. Due to the miscellaneous content and the fact that there aren’t enough human-labeled data, it is hard to perform supervised classification methods on these text. Therefore, we use an unsupervised topic analysis technique to find the latent topic distribution of the data. Topic modeling can infer latent topic distributions and cluster similar documents without defining topic labels or train data beforehand. We perform topic modeling by implementing latent Dirichlet allocation (LDA) using Gibbs sampling, and further estimate unseen data with the LDA model. In this thesis, we imply topic analysis on the comment section of the course evaluation questionnaire. We believe that with this automatic topic modeling method, it would be more efficient for analysts to analyze text data in questionnaires. Moreover, future work on automatic questionnaire analysis can be built on this approach.