Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia

News as one kind of information that is needed in daily life has been available on the internet. News website often categorizes their articles to each topic to help users access the news more easily. Document classification has widely used to do this automatically. The current availability of labele...

詳細記述

書誌詳細
出版年:Jurnal Nasional Teknik Elektro dan Teknologi Informasi
主要な著者: Joan Santoso, Agung Dewa Bagus Soetiono, Gunawan Gunawan, Endang Setyati, Eko Mulyanto Yuniarno, Mochamad Hariadi, Mauridhi Hery Purnomo
フォーマット: 論文
言語:英語
出版事項: Universitas Gadjah Mada 2018-06-01
主題:
オンライン・アクセス:http://ejnteti.jteti.ugm.ac.id/index.php/JNTETI/article/view/418
_version_ 1856932451355459584
author Joan Santoso
Agung Dewa Bagus Soetiono
Gunawan Gunawan
Endang Setyati
Eko Mulyanto Yuniarno
Mochamad Hariadi
Mauridhi Hery Purnomo
author_facet Joan Santoso
Agung Dewa Bagus Soetiono
Gunawan Gunawan
Endang Setyati
Eko Mulyanto Yuniarno
Mochamad Hariadi
Mauridhi Hery Purnomo
author_sort Joan Santoso
collection DOAJ
container_title Jurnal Nasional Teknik Elektro dan Teknologi Informasi
description News as one kind of information that is needed in daily life has been available on the internet. News website often categorizes their articles to each topic to help users access the news more easily. Document classification has widely used to do this automatically. The current availability of labeled training data is insufficient for the machine to create a good model. The problem in data annotation is that it requires a considerable cost and time to get sufficient quantity of labeled training data. A semi-supervised algorithm is proposed to solve this problem by using labeled and unlabeled data to create classification model. This paper proposes semi-supervised learning news classification system using Self-Training Naive Bayes algorithm. The feature that is used in text classification is Word2Vec Skip-Gram Model. This model is widely used in computational linguistics or text mining research as one of the methods in word representation. Word2Vec is used as a feature because it can bring the semantic meaning of the word in this classification task. The data used in this paper consists of 29,587 news documents from Indonesian online news websites. The Self-Training Naive Bayes algorithm achieved the highest F1-Score of 94.17%.
format Article
id doaj-art-86eacb2ea2fd4eb68c2da6ca3ed9d8ee
institution Directory of Open Access Journals
issn 2301-4156
2460-5719
language English
publishDate 2018-06-01
publisher Universitas Gadjah Mada
record_format Article
spelling doaj-art-86eacb2ea2fd4eb68c2da6ca3ed9d8ee2025-08-19T20:12:51ZengUniversitas Gadjah MadaJurnal Nasional Teknik Elektro dan Teknologi Informasi2301-41562460-57192018-06-017110.22146/jnteti.v7i2.418350Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa IndonesiaJoan SantosoAgung Dewa Bagus SoetionoGunawan GunawanEndang SetyatiEko Mulyanto YuniarnoMochamad HariadiMauridhi Hery PurnomoNews as one kind of information that is needed in daily life has been available on the internet. News website often categorizes their articles to each topic to help users access the news more easily. Document classification has widely used to do this automatically. The current availability of labeled training data is insufficient for the machine to create a good model. The problem in data annotation is that it requires a considerable cost and time to get sufficient quantity of labeled training data. A semi-supervised algorithm is proposed to solve this problem by using labeled and unlabeled data to create classification model. This paper proposes semi-supervised learning news classification system using Self-Training Naive Bayes algorithm. The feature that is used in text classification is Word2Vec Skip-Gram Model. This model is widely used in computational linguistics or text mining research as one of the methods in word representation. Word2Vec is used as a feature because it can bring the semantic meaning of the word in this classification task. The data used in this paper consists of 29,587 news documents from Indonesian online news websites. The Self-Training Naive Bayes algorithm achieved the highest F1-Score of 94.17%.http://ejnteti.jteti.ugm.ac.id/index.php/JNTETI/article/view/418Kategorisasi Berita, Word2Vec, Skip-Gram, Self-Training, Naive Bayes, Semi-supervised Learning, Bahasa Indonesia
spellingShingle Joan Santoso
Agung Dewa Bagus Soetiono
Gunawan Gunawan
Endang Setyati
Eko Mulyanto Yuniarno
Mochamad Hariadi
Mauridhi Hery Purnomo
Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia
Kategorisasi Berita, Word2Vec, Skip-Gram, Self-Training, Naive Bayes, Semi-supervised Learning, Bahasa Indonesia
title Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia
title_full Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia
title_fullStr Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia
title_full_unstemmed Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia
title_short Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia
title_sort self training naive bayes berbasis word2vec untuk kategorisasi berita bahasa indonesia
topic Kategorisasi Berita, Word2Vec, Skip-Gram, Self-Training, Naive Bayes, Semi-supervised Learning, Bahasa Indonesia
url http://ejnteti.jteti.ugm.ac.id/index.php/JNTETI/article/view/418
work_keys_str_mv AT joansantoso selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia
AT agungdewabagussoetiono selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia
AT gunawangunawan selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia
AT endangsetyati selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia
AT ekomulyantoyuniarno selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia
AT mochamadhariadi selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia
AT mauridhiherypurnomo selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia