Analysis of combinations of the spam classification and feature selection

碩士 === 淡江大學 === 資訊管理學系碩士班 === 104 === The spam-email overflow problems are mainly solved by filtering spam-emails through spam email classifications. They first select a set of feature words according to their indicative figures, and then apply a classification algorithm to decide whether an incom...

Full description

Bibliographic Details
Main Authors:	Yi-Teng Cheng, 鄭奕騰
Other Authors:	Chichang Jou
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/95950864370587719701

id	ndltd-TW-104TKU05396004
record_format	oai_dc
spelling	ndltd-TW-104TKU053960042017-09-03T04:24:55Z http://ndltd.ncl.edu.tw/handle/95950864370587719701 Analysis of combinations of the spam classification and feature selection 垃圾郵件分類及特徵選擇組合之分析研究 Yi-Teng Cheng 鄭奕騰碩士淡江大學資訊管理學系碩士班 104 The spam-email overflow problems are mainly solved by filtering spam-emails through spam email classifications. They first select a set of feature words according to their indicative figures, and then apply a classification algorithm to decide whether an incoming email is a spam. However, the problem has not been solved completely. There is a need to further analyze related characteristics of the feature words selection indicatives and classification algorithms to achieve better classification effectiveness. We use two feature words selection indicatives: TFIDF (Term Frequency–Inverse Document Frequency) and IG (Information Gain) and two classification algorithms: Weighted Naive Bayesian and SVM (Support Vector Machine) as representatives in the analysis. By using them independently, under the intersection operator, or under the union operator, through experiments in the context of concept drift, we compare the classification effectiveness of these 16 combinations of feature selection indicatives and classification algorithms. Additionally, for each experiment we analyse the classification effectiveness of the best combination different accumulated number of e-mails. Stability of the combination is also discussed. Chichang Jou 周清江 2016 學位論文 ; thesis 63 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 淡江大學 === 資訊管理學系碩士班 === 104 === The spam-email overflow problems are mainly solved by filtering spam-emails through spam email classifications. They first select a set of feature words according to their indicative figures, and then apply a classification algorithm to decide whether an incoming email is a spam. However, the problem has not been solved completely. There is a need to further analyze related characteristics of the feature words selection indicatives and classification algorithms to achieve better classification effectiveness. We use two feature words selection indicatives: TFIDF (Term Frequency–Inverse Document Frequency) and IG (Information Gain) and two classification algorithms: Weighted Naive Bayesian and SVM (Support Vector Machine) as representatives in the analysis. By using them independently, under the intersection operator, or under the union operator, through experiments in the context of concept drift, we compare the classification effectiveness of these 16 combinations of feature selection indicatives and classification algorithms. Additionally, for each experiment we analyse the classification effectiveness of the best combination different accumulated number of e-mails. Stability of the combination is also discussed.
author2	Chichang Jou
author_facet	Chichang Jou Yi-Teng Cheng 鄭奕騰
author	Yi-Teng Cheng 鄭奕騰
spellingShingle	Yi-Teng Cheng 鄭奕騰 Analysis of combinations of the spam classification and feature selection
author_sort	Yi-Teng Cheng
title	Analysis of combinations of the spam classification and feature selection
title_short	Analysis of combinations of the spam classification and feature selection
title_full	Analysis of combinations of the spam classification and feature selection
title_fullStr	Analysis of combinations of the spam classification and feature selection
title_full_unstemmed	Analysis of combinations of the spam classification and feature selection
title_sort	analysis of combinations of the spam classification and feature selection
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/95950864370587719701
work_keys_str_mv	AT yitengcheng analysisofcombinationsofthespamclassificationandfeatureselection AT zhèngyìténg analysisofcombinationsofthespamclassificationandfeatureselection AT yitengcheng lājīyóujiànfēnlèijítèzhēngxuǎnzézǔhézhīfēnxīyánjiū AT zhèngyìténg lājīyóujiànfēnlèijítèzhēngxuǎnzézǔhézhīfēnxīyánjiū
_version_	1718526000076161024

Analysis of combinations of the spam classification and feature selection

Similar Items