Latent Dirichlet Allocation、classification、feature scaling、positive and negative data

碩士 === 真理大學 === 資訊工程學系碩士班 === 99 === Among various classification problems, the one-class problem focuses on how to correctly classify data into target class. The others outside target class are called outliers. In this paper, we are going to discuss the classification for extremely biased data set...

Full description

Bibliographic Details
Main Authors: Shih-Yi Kuo, 郭士毅
Other Authors: Jian-hua Yeh
Format: Others
Language:zh-TW
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/59282352751313706118
id ndltd-TW-099AU000392001
record_format oai_dc
spelling ndltd-TW-099AU0003920012015-10-13T19:20:28Z http://ndltd.ncl.edu.tw/handle/59282352751313706118 Latent Dirichlet Allocation、classification、feature scaling、positive and negative data 以潛藏主題向量模型進行醫學影像分類之研究 Shih-Yi Kuo 郭士毅 碩士 真理大學 資訊工程學系碩士班 99 Among various classification problems, the one-class problem focuses on how to correctly classify data into target class. The others outside target class are called outliers. In this paper, we are going to discuss the classification for extremely biased data set the amount of outliers is far larger than target set of data, and the outliers are discretely distributed. We use KDD Cup 2008 challenge as our target experiment. There are 1,712 patients, only 118 of them are malignant of breast cancer. Each patent’s data is represented as multiple candidates records. In summary, there are 102,294 candidates and only 623 positive (positive) samples. These positive samples are our target set for classification. Under the above data distribution, how to separate normal data with malignant data is a pretty difficult job. In this paper, we propose a classification process flow to solve this problem. Jian-hua Yeh 葉建華 2011 學位論文 ; thesis 69 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 真理大學 === 資訊工程學系碩士班 === 99 === Among various classification problems, the one-class problem focuses on how to correctly classify data into target class. The others outside target class are called outliers. In this paper, we are going to discuss the classification for extremely biased data set the amount of outliers is far larger than target set of data, and the outliers are discretely distributed. We use KDD Cup 2008 challenge as our target experiment. There are 1,712 patients, only 118 of them are malignant of breast cancer. Each patent’s data is represented as multiple candidates records. In summary, there are 102,294 candidates and only 623 positive (positive) samples. These positive samples are our target set for classification. Under the above data distribution, how to separate normal data with malignant data is a pretty difficult job. In this paper, we propose a classification process flow to solve this problem.
author2 Jian-hua Yeh
author_facet Jian-hua Yeh
Shih-Yi Kuo
郭士毅
author Shih-Yi Kuo
郭士毅
spellingShingle Shih-Yi Kuo
郭士毅
Latent Dirichlet Allocation、classification、feature scaling、positive and negative data
author_sort Shih-Yi Kuo
title Latent Dirichlet Allocation、classification、feature scaling、positive and negative data
title_short Latent Dirichlet Allocation、classification、feature scaling、positive and negative data
title_full Latent Dirichlet Allocation、classification、feature scaling、positive and negative data
title_fullStr Latent Dirichlet Allocation、classification、feature scaling、positive and negative data
title_full_unstemmed Latent Dirichlet Allocation、classification、feature scaling、positive and negative data
title_sort latent dirichlet allocation、classification、feature scaling、positive and negative data
publishDate 2011
url http://ndltd.ncl.edu.tw/handle/59282352751313706118
work_keys_str_mv AT shihyikuo latentdirichletallocationclassificationfeaturescalingpositiveandnegativedata
AT guōshìyì latentdirichletallocationclassificationfeaturescalingpositiveandnegativedata
AT shihyikuo yǐqiáncángzhǔtíxiàngliàngmóxíngjìnxíngyīxuéyǐngxiàngfēnlèizhīyánjiū
AT guōshìyì yǐqiáncángzhǔtíxiàngliàngmóxíngjìnxíngyīxuéyǐngxiàngfēnlèizhīyánjiū
_version_ 1718042335635308544