Latent Dirichlet Allocation、classification、feature scaling、positive and negative data
碩士 === 真理大學 === 資訊工程學系碩士班 === 99 === Among various classification problems, the one-class problem focuses on how to correctly classify data into target class. The others outside target class are called outliers. In this paper, we are going to discuss the classification for extremely biased data set...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2011
|
Online Access: | http://ndltd.ncl.edu.tw/handle/59282352751313706118 |
id |
ndltd-TW-099AU000392001 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-099AU0003920012015-10-13T19:20:28Z http://ndltd.ncl.edu.tw/handle/59282352751313706118 Latent Dirichlet Allocation、classification、feature scaling、positive and negative data 以潛藏主題向量模型進行醫學影像分類之研究 Shih-Yi Kuo 郭士毅 碩士 真理大學 資訊工程學系碩士班 99 Among various classification problems, the one-class problem focuses on how to correctly classify data into target class. The others outside target class are called outliers. In this paper, we are going to discuss the classification for extremely biased data set the amount of outliers is far larger than target set of data, and the outliers are discretely distributed. We use KDD Cup 2008 challenge as our target experiment. There are 1,712 patients, only 118 of them are malignant of breast cancer. Each patent’s data is represented as multiple candidates records. In summary, there are 102,294 candidates and only 623 positive (positive) samples. These positive samples are our target set for classification. Under the above data distribution, how to separate normal data with malignant data is a pretty difficult job. In this paper, we propose a classification process flow to solve this problem. Jian-hua Yeh 葉建華 2011 學位論文 ; thesis 69 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 真理大學 === 資訊工程學系碩士班 === 99 === Among various classification problems, the one-class problem focuses on how to correctly classify data into target class. The others outside target class are called outliers. In this paper, we are going to discuss the classification for extremely biased data set the amount of outliers is far larger than target set of data, and the outliers are discretely distributed.
We use KDD Cup 2008 challenge as our target experiment. There are 1,712 patients, only 118 of them are malignant of breast cancer. Each patent’s data is represented as multiple candidates records. In summary, there are 102,294 candidates and only 623 positive (positive) samples. These positive samples are our target set for classification. Under the above data distribution, how to separate normal data with malignant data is a pretty difficult job. In this paper, we propose a classification process flow to solve this problem.
|
author2 |
Jian-hua Yeh |
author_facet |
Jian-hua Yeh Shih-Yi Kuo 郭士毅 |
author |
Shih-Yi Kuo 郭士毅 |
spellingShingle |
Shih-Yi Kuo 郭士毅 Latent Dirichlet Allocation、classification、feature scaling、positive and negative data |
author_sort |
Shih-Yi Kuo |
title |
Latent Dirichlet Allocation、classification、feature scaling、positive and negative data |
title_short |
Latent Dirichlet Allocation、classification、feature scaling、positive and negative data |
title_full |
Latent Dirichlet Allocation、classification、feature scaling、positive and negative data |
title_fullStr |
Latent Dirichlet Allocation、classification、feature scaling、positive and negative data |
title_full_unstemmed |
Latent Dirichlet Allocation、classification、feature scaling、positive and negative data |
title_sort |
latent dirichlet allocation、classification、feature scaling、positive and negative data |
publishDate |
2011 |
url |
http://ndltd.ncl.edu.tw/handle/59282352751313706118 |
work_keys_str_mv |
AT shihyikuo latentdirichletallocationclassificationfeaturescalingpositiveandnegativedata AT guōshìyì latentdirichletallocationclassificationfeaturescalingpositiveandnegativedata AT shihyikuo yǐqiáncángzhǔtíxiàngliàngmóxíngjìnxíngyīxuéyǐngxiàngfēnlèizhīyánjiū AT guōshìyì yǐqiáncángzhǔtíxiàngliàngmóxíngjìnxíngyīxuéyǐngxiàngfēnlèizhīyánjiū |
_version_ |
1718042335635308544 |