Detecting Spam Blog

碩士 === 元智大學 === 資訊管理學系 === 95 === Blog is the most popular topic on the Internet. Blogger can share their feeling and life on the Blog. Many network companies like Google and Yahoo also provide free Blog space for Internet user. But there are many illegal users on the Internet. They use free Blog sp...

Full description

Bibliographic Details
Main Authors: Sz-Shao Liao, 廖偲劭
Other Authors: 劉俞志
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/80068916457414374881
Description
Summary:碩士 === 元智大學 === 資訊管理學系 === 95 === Blog is the most popular topic on the Internet. Blogger can share their feeling and life on the Blog. Many network companies like Google and Yahoo also provide free Blog space for Internet user. But there are many illegal users on the Internet. They use free Blog space to construct faked web pages. Those faked Blogs which mean nothing are called Spam Blog. Spam Blog can improve their ranking using an illegal method. The most popular way is adding lots of keywords in the Blog. Generally speaking, those keywords are nouns. On the other hand, Blog is similar to an Internet diary, in which people write structured article. The part of speech in Blog is presented a normal distribution, so we can use this characteristic to detect Spam Blogs. And we will establish a predictive model by using Support Vector Machine(SVM), whose model can help a search engine to find out Spam Blogs quickly. And it also helps Internet users attain really useful information.