Suspicious URL Filter based on Logistic Regression with Multi-view Analysis

碩士 === 國立臺灣科技大學 === 資訊工程系 === 100 === The current malicious URLs detecting techniques based on URL analysis are hard to find the malicious URLs infected via the obfuscated techniques (e.g., insertion of benign tokens). In this study, we propose an approach based on multi-view in order to reduce the...

Full description

Bibliographic Details
Main Authors: Ke-wei Su, 蘇克維
Other Authors: Han-ming Lee
Format: Others
Language:en_US
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/3kf7h7
id ndltd-TW-100NTUS5392041
record_format oai_dc
spelling ndltd-TW-100NTUS53920412019-05-15T20:43:22Z http://ndltd.ncl.edu.tw/handle/3kf7h7 Suspicious URL Filter based on Logistic Regression with Multi-view Analysis 可疑連結過濾器基於羅吉斯迴歸與多觀點分析 Ke-wei Su 蘇克維 碩士 國立臺灣科技大學 資訊工程系 100 The current malicious URLs detecting techniques based on URL analysis are hard to find the malicious URLs infected via the obfuscated techniques (e.g., insertion of benign tokens). In this study, we propose an approach based on multi-view in order to reduce the impact from obfuscated techniques. The URLs are composed with several tokens, and each token has different meaning. The hackers use different obfuscated techniques with token combination on different portions, and these techniques have their own behavior. This mechanism intends to learn the behaviors from different portions of URLs (e.g., authority portions) for identifying the level of suspicion of each portion. With comparing the suspicious level of each parts between each URLs, this system would select the most suspicious URLs. This thesis makes following contributions: (1) Provide a multi-view mechanism for reducing the effect from obfuscated techniques, (2) Automatic filtering out the suspicious URLs without the need for additional configuration and modification in automatic way, (3) dealing with large scale and unbalance data with effectiveness, and (4) satisfying the requirements of industry. In the system evaluation, this thesis uses the real data set from T. Co.. According to the requirements of T. Co.: (1) detection rate should be less than 25%, (2) missing rate should be lower than 25%, and (3) the process with one hour data should be end in i a hour. The experimental results show that our approach is effective, and is with the ability to find more malicious URLs and satisfy the requirements given by practical environment as well as T. Co.. Han-ming Lee 李漢銘 2012 學位論文 ; thesis 45 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊工程系 === 100 === The current malicious URLs detecting techniques based on URL analysis are hard to find the malicious URLs infected via the obfuscated techniques (e.g., insertion of benign tokens). In this study, we propose an approach based on multi-view in order to reduce the impact from obfuscated techniques. The URLs are composed with several tokens, and each token has different meaning. The hackers use different obfuscated techniques with token combination on different portions, and these techniques have their own behavior. This mechanism intends to learn the behaviors from different portions of URLs (e.g., authority portions) for identifying the level of suspicion of each portion. With comparing the suspicious level of each parts between each URLs, this system would select the most suspicious URLs. This thesis makes following contributions: (1) Provide a multi-view mechanism for reducing the effect from obfuscated techniques, (2) Automatic filtering out the suspicious URLs without the need for additional configuration and modification in automatic way, (3) dealing with large scale and unbalance data with effectiveness, and (4) satisfying the requirements of industry. In the system evaluation, this thesis uses the real data set from T. Co.. According to the requirements of T. Co.: (1) detection rate should be less than 25%, (2) missing rate should be lower than 25%, and (3) the process with one hour data should be end in i a hour. The experimental results show that our approach is effective, and is with the ability to find more malicious URLs and satisfy the requirements given by practical environment as well as T. Co..
author2 Han-ming Lee
author_facet Han-ming Lee
Ke-wei Su
蘇克維
author Ke-wei Su
蘇克維
spellingShingle Ke-wei Su
蘇克維
Suspicious URL Filter based on Logistic Regression with Multi-view Analysis
author_sort Ke-wei Su
title Suspicious URL Filter based on Logistic Regression with Multi-view Analysis
title_short Suspicious URL Filter based on Logistic Regression with Multi-view Analysis
title_full Suspicious URL Filter based on Logistic Regression with Multi-view Analysis
title_fullStr Suspicious URL Filter based on Logistic Regression with Multi-view Analysis
title_full_unstemmed Suspicious URL Filter based on Logistic Regression with Multi-view Analysis
title_sort suspicious url filter based on logistic regression with multi-view analysis
publishDate 2012
url http://ndltd.ncl.edu.tw/handle/3kf7h7
work_keys_str_mv AT keweisu suspiciousurlfilterbasedonlogisticregressionwithmultiviewanalysis
AT sūkèwéi suspiciousurlfilterbasedonlogisticregressionwithmultiviewanalysis
AT keweisu kěyíliánjiéguòlǜqìjīyúluójísīhuíguīyǔduōguāndiǎnfēnxī
AT sūkèwéi kěyíliánjiéguòlǜqìjīyúluójísīhuíguīyǔduōguāndiǎnfēnxī
_version_ 1719104641154678784