Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia

News as one kind of information that is needed in daily life has been available on the internet. News website often categorizes their articles to each topic to help users access the news more easily. Document classification has widely used to do this automatically. The current availability of labele...

詳細記述

書誌詳細
出版年:	Jurnal Nasional Teknik Elektro dan Teknologi Informasi
主要な著者:	Joan Santoso, Agung Dewa Bagus Soetiono, Gunawan Gunawan, Endang Setyati, Eko Mulyanto Yuniarno, Mochamad Hariadi, Mauridhi Hery Purnomo
フォーマット:	論文
言語:	英語
出版事項:	Universitas Gadjah Mada 2018-06-01
主題:	Kategorisasi Berita, Word2Vec, Skip-Gram, Self-Training, Naive Bayes, Semi-supervised Learning, Bahasa Indonesia
オンライン･アクセス:	http://ejnteti.jteti.ugm.ac.id/index.php/JNTETI/article/view/418

_version_	1856932451355459584
author	Joan Santoso Agung Dewa Bagus Soetiono Gunawan Gunawan Endang Setyati Eko Mulyanto Yuniarno Mochamad Hariadi Mauridhi Hery Purnomo
author_facet	Joan Santoso Agung Dewa Bagus Soetiono Gunawan Gunawan Endang Setyati Eko Mulyanto Yuniarno Mochamad Hariadi Mauridhi Hery Purnomo
author_sort	Joan Santoso
collection	DOAJ
container_title	Jurnal Nasional Teknik Elektro dan Teknologi Informasi
description	News as one kind of information that is needed in daily life has been available on the internet. News website often categorizes their articles to each topic to help users access the news more easily. Document classification has widely used to do this automatically. The current availability of labeled training data is insufficient for the machine to create a good model. The problem in data annotation is that it requires a considerable cost and time to get sufficient quantity of labeled training data. A semi-supervised algorithm is proposed to solve this problem by using labeled and unlabeled data to create classification model. This paper proposes semi-supervised learning news classification system using Self-Training Naive Bayes algorithm. The feature that is used in text classification is Word2Vec Skip-Gram Model. This model is widely used in computational linguistics or text mining research as one of the methods in word representation. Word2Vec is used as a feature because it can bring the semantic meaning of the word in this classification task. The data used in this paper consists of 29,587 news documents from Indonesian online news websites. The Self-Training Naive Bayes algorithm achieved the highest F1-Score of 94.17%.
format	Article
id	doaj-art-86eacb2ea2fd4eb68c2da6ca3ed9d8ee
institution	Directory of Open Access Journals
issn	2301-4156 2460-5719
language	English
publishDate	2018-06-01
publisher	Universitas Gadjah Mada
record_format	Article
spelling	doaj-art-86eacb2ea2fd4eb68c2da6ca3ed9d8ee2025-08-19T20:12:51ZengUniversitas Gadjah MadaJurnal Nasional Teknik Elektro dan Teknologi Informasi2301-41562460-57192018-06-017110.22146/jnteti.v7i2.418350Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa IndonesiaJoan SantosoAgung Dewa Bagus SoetionoGunawan GunawanEndang SetyatiEko Mulyanto YuniarnoMochamad HariadiMauridhi Hery PurnomoNews as one kind of information that is needed in daily life has been available on the internet. News website often categorizes their articles to each topic to help users access the news more easily. Document classification has widely used to do this automatically. The current availability of labeled training data is insufficient for the machine to create a good model. The problem in data annotation is that it requires a considerable cost and time to get sufficient quantity of labeled training data. A semi-supervised algorithm is proposed to solve this problem by using labeled and unlabeled data to create classification model. This paper proposes semi-supervised learning news classification system using Self-Training Naive Bayes algorithm. The feature that is used in text classification is Word2Vec Skip-Gram Model. This model is widely used in computational linguistics or text mining research as one of the methods in word representation. Word2Vec is used as a feature because it can bring the semantic meaning of the word in this classification task. The data used in this paper consists of 29,587 news documents from Indonesian online news websites. The Self-Training Naive Bayes algorithm achieved the highest F1-Score of 94.17%.http://ejnteti.jteti.ugm.ac.id/index.php/JNTETI/article/view/418Kategorisasi Berita, Word2Vec, Skip-Gram, Self-Training, Naive Bayes, Semi-supervised Learning, Bahasa Indonesia
spellingShingle	Joan Santoso Agung Dewa Bagus Soetiono Gunawan Gunawan Endang Setyati Eko Mulyanto Yuniarno Mochamad Hariadi Mauridhi Hery Purnomo Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia Kategorisasi Berita, Word2Vec, Skip-Gram, Self-Training, Naive Bayes, Semi-supervised Learning, Bahasa Indonesia
title	Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia
title_full	Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia
title_fullStr	Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia
title_full_unstemmed	Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia
title_short	Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia
title_sort	self training naive bayes berbasis word2vec untuk kategorisasi berita bahasa indonesia
topic	Kategorisasi Berita, Word2Vec, Skip-Gram, Self-Training, Naive Bayes, Semi-supervised Learning, Bahasa Indonesia
url	http://ejnteti.jteti.ugm.ac.id/index.php/JNTETI/article/view/418
work_keys_str_mv	AT joansantoso selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia AT agungdewabagussoetiono selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia AT gunawangunawan selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia AT endangsetyati selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia AT ekomulyantoyuniarno selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia AT mochamadhariadi selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia AT mauridhiherypurnomo selftrainingnaivebayesberbasisword2vecuntukkategorisasiberitabahasaindonesia

Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia

類似資料