Application of Decision Tree Methods on Spam Filtering

碩士 === 淡江大學 === 統計學系碩士班 === 93 === As a result of the progress on computer science and the development of Internet, Email has been the important communication medium in daily life. Email Advertising becomes the most efficient technique in marketing, and therefore arises the problem about spam. The a...

Full description

Bibliographic Details
Main Authors: Meng-Chuan, Tsai, 蔡孟娟
Other Authors: Ching-Hsiang Chen
Format: Others
Language:zh-TW
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/95491262592610440256
id ndltd-TW-093TKU05337015
record_format oai_dc
spelling ndltd-TW-093TKU053370152015-10-13T11:57:26Z http://ndltd.ncl.edu.tw/handle/95491262592610440256 Application of Decision Tree Methods on Spam Filtering 決策樹法在垃圾郵件過濾之應用 Meng-Chuan, Tsai 蔡孟娟 碩士 淡江大學 統計學系碩士班 93 As a result of the progress on computer science and the development of Internet, Email has been the important communication medium in daily life. Email Advertising becomes the most efficient technique in marketing, and therefore arises the problem about spam. The amounts of spam increase quickly. It not only takes the network resources and makes the burden on system, but also wastes the receiver’s time. Spam filtering becomes a popular research issue in recent years. In this study, we use three decision tree methods of data mining technology to classify Emails into “spam” and “legitimate” based on fourteen characteristics of Email. The three decision tree methods are compared with bayes classifier, which is most often used in spam filtering at present. When the efficiency of classification and misclassification costs are considered, C4.5 method has the best outcome in our case study of spam mails. It takes the shortest test time among the three decision tree methods. Our study also shows that we can avoid misclassifying legitimate by using the white list before we apply the classification. Ching-Hsiang Chen 陳景祥 2005 學位論文 ; thesis 63 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 淡江大學 === 統計學系碩士班 === 93 === As a result of the progress on computer science and the development of Internet, Email has been the important communication medium in daily life. Email Advertising becomes the most efficient technique in marketing, and therefore arises the problem about spam. The amounts of spam increase quickly. It not only takes the network resources and makes the burden on system, but also wastes the receiver’s time. Spam filtering becomes a popular research issue in recent years. In this study, we use three decision tree methods of data mining technology to classify Emails into “spam” and “legitimate” based on fourteen characteristics of Email. The three decision tree methods are compared with bayes classifier, which is most often used in spam filtering at present. When the efficiency of classification and misclassification costs are considered, C4.5 method has the best outcome in our case study of spam mails. It takes the shortest test time among the three decision tree methods. Our study also shows that we can avoid misclassifying legitimate by using the white list before we apply the classification.
author2 Ching-Hsiang Chen
author_facet Ching-Hsiang Chen
Meng-Chuan, Tsai
蔡孟娟
author Meng-Chuan, Tsai
蔡孟娟
spellingShingle Meng-Chuan, Tsai
蔡孟娟
Application of Decision Tree Methods on Spam Filtering
author_sort Meng-Chuan, Tsai
title Application of Decision Tree Methods on Spam Filtering
title_short Application of Decision Tree Methods on Spam Filtering
title_full Application of Decision Tree Methods on Spam Filtering
title_fullStr Application of Decision Tree Methods on Spam Filtering
title_full_unstemmed Application of Decision Tree Methods on Spam Filtering
title_sort application of decision tree methods on spam filtering
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/95491262592610440256
work_keys_str_mv AT mengchuantsai applicationofdecisiontreemethodsonspamfiltering
AT càimèngjuān applicationofdecisiontreemethodsonspamfiltering
AT mengchuantsai juécèshùfǎzàilājīyóujiànguòlǜzhīyīngyòng
AT càimèngjuān juécèshùfǎzàilājīyóujiànguòlǜzhīyīngyòng
_version_ 1716851818581983232