Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia
News as one kind of information that is needed in daily life has been available on the internet. News website often categorizes their articles to each topic to help users access the news more easily. Document classification has widely used to do this automatically. The current availability of labele...
| 出版年: | Jurnal Nasional Teknik Elektro dan Teknologi Informasi |
|---|---|
| 主要な著者: | , , , , , , |
| フォーマット: | 論文 |
| 言語: | 英語 |
| 出版事項: |
Universitas Gadjah Mada
2018-06-01
|
| 主題: | |
| オンライン・アクセス: | http://ejnteti.jteti.ugm.ac.id/index.php/JNTETI/article/view/418 |
| _version_ | 1856932451355459584 |
|---|---|
| author | Joan Santoso Agung Dewa Bagus Soetiono Gunawan Gunawan Endang Setyati Eko Mulyanto Yuniarno Mochamad Hariadi Mauridhi Hery Purnomo |
| author_facet | Joan Santoso Agung Dewa Bagus Soetiono Gunawan Gunawan Endang Setyati Eko Mulyanto Yuniarno Mochamad Hariadi Mauridhi Hery Purnomo |
| author_sort | Joan Santoso |
| collection | DOAJ |
| container_title | Jurnal Nasional Teknik Elektro dan Teknologi Informasi |
| description | News as one kind of information that is needed in daily life has been available on the internet. News website often categorizes their articles to each topic to help users access the news more easily. Document classification has widely used to do this automatically. The current availability of labeled training data is insufficient for the machine to create a good model. The problem in data annotation is that it requires a considerable cost and time to get sufficient quantity of labeled training data. A semi-supervised algorithm is proposed to solve this problem by using labeled and unlabeled data to create classification model. This paper proposes semi-supervised learning news classification system using Self-Training Naive Bayes algorithm. The feature that is used in text classification is Word2Vec Skip-Gram Model. This model is widely used in computational linguistics or text mining research as one of the methods in word representation. Word2Vec is used as a feature because it can bring the semantic meaning of the word in this classification task. The data used in this paper consists of 29,587 news documents from Indonesian online news websites. The Self-Training Naive Bayes algorithm achieved the highest F1-Score of 94.17%. |
| format | Article |
| id | doaj-art-86eacb2ea2fd4eb68c2da6ca3ed9d8ee |
| institution | Directory of Open Access Journals |
| issn | 2301-4156 2460-5719 |
| language | English |
| publishDate | 2018-06-01 |
| publisher | Universitas Gadjah Mada |
| record_format | Article |
| spelling | doaj-art-86eacb2ea2fd4eb68c2da6ca3ed9d8ee2025-08-19T20:12:51ZengUniversitas Gadjah MadaJurnal Nasional Teknik Elektro dan Teknologi Informasi2301-41562460-57192018-06-017110.22146/jnteti.v7i2.418350Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa IndonesiaJoan SantosoAgung Dewa Bagus SoetionoGunawan GunawanEndang SetyatiEko Mulyanto YuniarnoMochamad HariadiMauridhi Hery PurnomoNews as one kind of information that is needed in daily life has been available on the internet. News website often categorizes their articles to each topic to help users access the news more easily. Document classification has widely used to do this automatically. The current availability of labeled training data is insufficient for the machine to create a good model. The problem in data annotation is that it requires a considerable cost and time to get sufficient quantity of labeled training data. A semi-supervised algorithm is proposed to solve this problem by using labeled and unlabeled data to create classification model. This paper proposes semi-supervised learning news classification system using Self-Training Naive Bayes algorithm. The feature that is used in text classification is Word2Vec Skip-Gram Model. This model is widely used in computational linguistics or text mining research as one of the methods in word representation. Word2Vec is used as a feature because it can bring the semantic meaning of the word in this classification task. The data used in this paper consists of 29,587 news documents from Indonesian online news websites. The Self-Training Naive Bayes algorithm achieved the highest F1-Score of 94.17%.http://ejnteti.jteti.ugm.ac.id/index.php/JNTETI/article/view/418Kategorisasi Berita, Word2Vec, Skip-Gram, Self-Training, Naive Bayes, Semi-supervised Learning, Bahasa Indonesia |
| spellingShingle | Joan Santoso Agung Dewa Bagus Soetiono Gunawan Gunawan Endang Setyati Eko Mulyanto Yuniarno Mochamad Hariadi Mauridhi Hery Purnomo Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia Kategorisasi Berita, Word2Vec, Skip-Gram, Self-Training, Naive Bayes, Semi-supervised Learning, Bahasa Indonesia |
| title | Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia |
| title_full | Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia |
| title_fullStr | Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia |
| title_full_unstemmed | Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia |
| title_short | Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia |
| title_sort | self training naive bayes berbasis word2vec untuk kategorisasi berita bahasa indonesia |
| topic | Kategorisasi Berita, Word2Vec, Skip-Gram, Self-Training, Naive Bayes, Semi-supervised Learning, Bahasa Indonesia |
| url | http://ejnteti.jteti.ugm.ac.id/index.php/JNTETI/article/view/418 |
| work_keys_str_mv | AT joansantoso selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia AT agungdewabagussoetiono selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia AT gunawangunawan selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia AT endangsetyati selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia AT ekomulyantoyuniarno selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia AT mochamadhariadi selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia AT mauridhiherypurnomo selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia |
