Utilization of virtual sample generation to facilitate cancer identification for gene expression data in early stages

碩士 === 國立成功大學 === 資訊管理研究所 === 95 === DNA microarray today plays an important role of the cancer classification problem. Microarray technology allows us to measure the expression levels of thousands of genes simultaneously in clinical experiments. Clinicians are enable to obtain the gene expression p...

Full description

Bibliographic Details
Main Authors: Yong-Yao Lai, 賴永耀
Other Authors: De-Jiang Li
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/81256349090751870621
Description
Summary:碩士 === 國立成功大學 === 資訊管理研究所 === 95 === DNA microarray today plays an important role of the cancer classification problem. Microarray technology allows us to measure the expression levels of thousands of genes simultaneously in clinical experiments. Clinicians are enable to obtain the gene expression profile of tissue samples rapidly and make decision correctly. DNA microarray data are characterized as low size, high dimensionality (this is called the small samples problem), a large number of noise or high correlation genes. Recently researchers apply gene selection mechanism to find the genes most relevant to a specific classification task. It can improve learning accuracy and reduce the computation cost, but can not solve the innately limited of lock of training samples, for example, in the early stages, like during the outbreak of the new disease, only limited data can be obtained, so that the model derived is also too unstable to deal with the new disease effectively, and the performance can not improve significant even thought using gene selection mechanism. In this paper, we propose the virtual sample technique, named CKDE (Clusterized Kernel Density Estimation). Not only apply gene selection mechanism but then analysis the characteristic of data after reduced. Generate the virtual sample to increase meaningful information, and the proposed model improves the learning accuracy significantly.