Analysis of combinations of the spam classification and feature selection

碩士 === 淡江大學 === 資訊管理學系碩士班 === 104 === The spam-email overflow problems are mainly solved by filtering spam-emails through spam email classifications. They first select a set of feature words according to their indicative figures, and then apply a classification algorithm to decide whether an incom...

Full description

Bibliographic Details
Main Authors: Yi-Teng Cheng, 鄭奕騰
Other Authors: Chichang Jou
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/95950864370587719701
id ndltd-TW-104TKU05396004
record_format oai_dc
spelling ndltd-TW-104TKU053960042017-09-03T04:24:55Z http://ndltd.ncl.edu.tw/handle/95950864370587719701 Analysis of combinations of the spam classification and feature selection 垃圾郵件分類及特徵選擇組合之分析研究 Yi-Teng Cheng 鄭奕騰 碩士 淡江大學 資訊管理學系碩士班 104 The spam-email overflow problems are mainly solved by filtering spam-emails through spam email classifications. They first select a set of feature words according to their indicative figures, and then apply a classification algorithm to decide whether an incoming email is a spam. However, the problem has not been solved completely. There is a need to further analyze related characteristics of the feature words selection indicatives and classification algorithms to achieve better classification effectiveness. We use two feature words selection indicatives: TFIDF (Term Frequency–Inverse Document Frequency) and IG (Information Gain) and two classification algorithms: Weighted Naive Bayesian and SVM (Support Vector Machine) as representatives in the analysis. By using them independently, under the intersection operator, or under the union operator, through experiments in the context of concept drift, we compare the classification effectiveness of these 16 combinations of feature selection indicatives and classification algorithms. Additionally, for each experiment we analyse the classification effectiveness of the best combination different accumulated number of e-mails. Stability of the combination is also discussed. Chichang Jou 周清江 2016 學位論文 ; thesis 63 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 淡江大學 === 資訊管理學系碩士班 === 104 === The spam-email overflow problems are mainly solved by filtering spam-emails through spam email classifications. They first select a set of feature words according to their indicative figures, and then apply a classification algorithm to decide whether an incoming email is a spam. However, the problem has not been solved completely. There is a need to further analyze related characteristics of the feature words selection indicatives and classification algorithms to achieve better classification effectiveness. We use two feature words selection indicatives: TFIDF (Term Frequency–Inverse Document Frequency) and IG (Information Gain) and two classification algorithms: Weighted Naive Bayesian and SVM (Support Vector Machine) as representatives in the analysis. By using them independently, under the intersection operator, or under the union operator, through experiments in the context of concept drift, we compare the classification effectiveness of these 16 combinations of feature selection indicatives and classification algorithms. Additionally, for each experiment we analyse the classification effectiveness of the best combination different accumulated number of e-mails. Stability of the combination is also discussed.
author2 Chichang Jou
author_facet Chichang Jou
Yi-Teng Cheng
鄭奕騰
author Yi-Teng Cheng
鄭奕騰
spellingShingle Yi-Teng Cheng
鄭奕騰
Analysis of combinations of the spam classification and feature selection
author_sort Yi-Teng Cheng
title Analysis of combinations of the spam classification and feature selection
title_short Analysis of combinations of the spam classification and feature selection
title_full Analysis of combinations of the spam classification and feature selection
title_fullStr Analysis of combinations of the spam classification and feature selection
title_full_unstemmed Analysis of combinations of the spam classification and feature selection
title_sort analysis of combinations of the spam classification and feature selection
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/95950864370587719701
work_keys_str_mv AT yitengcheng analysisofcombinationsofthespamclassificationandfeatureselection
AT zhèngyìténg analysisofcombinationsofthespamclassificationandfeatureselection
AT yitengcheng lājīyóujiànfēnlèijítèzhēngxuǎnzézǔhézhīfēnxīyánjiū
AT zhèngyìténg lājīyóujiànfēnlèijítèzhēngxuǎnzézǔhézhīfēnxīyánjiū
_version_ 1718526000076161024