Sliced-based sufficient dimension reduction for binary imbalanced data

碩士 === 國立臺北大學 === 統計學系 === 107 === to high-dimensional data to find the effective DR directions benefit users to explore the intrinsic structure of high-dimensional data in the low-dimensional subspace. The dimension-reduced data could be regarded as the features of the raw data, and can further be...

Full description

Bibliographic Details
Main Authors: HSU, WEI-TSE, 徐維澤
Other Authors: WU, HAN-MING
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/97bqn3
id ndltd-TW-107NTPU0337036
record_format oai_dc
spelling ndltd-TW-107NTPU03370362019-08-24T03:36:45Z http://ndltd.ncl.edu.tw/handle/97bqn3 Sliced-based sufficient dimension reduction for binary imbalanced data 切片型充分維度縮減法於二元不平衡資料之研究 HSU, WEI-TSE 徐維澤 碩士 國立臺北大學 統計學系 107 to high-dimensional data to find the effective DR directions benefit users to explore the intrinsic structure of high-dimensional data in the low-dimensional subspace. The dimension-reduced data could be regarded as the features of the raw data, and can further be employed in the classification and/or clustering problems. It has been shown that applying binary classification rules to the imbalanced data would cause prediction bias. The so-called imbalanced data is a dataset whose numbers of observations in two categories of a response variable are significantly different. However, even though many researches have been conducted to study the effects of the imbalanced data to the classification rules, there has been very little study reported on the applications of SDR to the imbalanced data in the literature. Therefore, in this study, we are motivated to investigate the effects of the binary imbalanced data to four SDR methods including Sliced Inverse Regression (SIR), Sliced Average Variance Estimation (SAVE), Difference of Covariance (DOC), and principal Hessian direction (pHd). The performance of the selected SDR methods is evaluated by the simulation studies and a real data analysis with or without pre-balancing process. The results of these numerical experiments show that the pre-balancing process is needed for SIR when the imbalanced data is consists of two similar group means and a smaller variance of the positive class. For SAVE, the prebalancing process is optional even the bias of the DR estimates of SAVE would be larger than SIR. As for DOC and pHd, the performance is worse than those of SIR and SAVE, which suggests that DOC and pHd are not suitable for the imbalanced data. WU, HAN-MING 吳漢銘 2019 學位論文 ; thesis 31 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺北大學 === 統計學系 === 107 === to high-dimensional data to find the effective DR directions benefit users to explore the intrinsic structure of high-dimensional data in the low-dimensional subspace. The dimension-reduced data could be regarded as the features of the raw data, and can further be employed in the classification and/or clustering problems. It has been shown that applying binary classification rules to the imbalanced data would cause prediction bias. The so-called imbalanced data is a dataset whose numbers of observations in two categories of a response variable are significantly different. However, even though many researches have been conducted to study the effects of the imbalanced data to the classification rules, there has been very little study reported on the applications of SDR to the imbalanced data in the literature. Therefore, in this study, we are motivated to investigate the effects of the binary imbalanced data to four SDR methods including Sliced Inverse Regression (SIR), Sliced Average Variance Estimation (SAVE), Difference of Covariance (DOC), and principal Hessian direction (pHd). The performance of the selected SDR methods is evaluated by the simulation studies and a real data analysis with or without pre-balancing process. The results of these numerical experiments show that the pre-balancing process is needed for SIR when the imbalanced data is consists of two similar group means and a smaller variance of the positive class. For SAVE, the prebalancing process is optional even the bias of the DR estimates of SAVE would be larger than SIR. As for DOC and pHd, the performance is worse than those of SIR and SAVE, which suggests that DOC and pHd are not suitable for the imbalanced data.
author2 WU, HAN-MING
author_facet WU, HAN-MING
HSU, WEI-TSE
徐維澤
author HSU, WEI-TSE
徐維澤
spellingShingle HSU, WEI-TSE
徐維澤
Sliced-based sufficient dimension reduction for binary imbalanced data
author_sort HSU, WEI-TSE
title Sliced-based sufficient dimension reduction for binary imbalanced data
title_short Sliced-based sufficient dimension reduction for binary imbalanced data
title_full Sliced-based sufficient dimension reduction for binary imbalanced data
title_fullStr Sliced-based sufficient dimension reduction for binary imbalanced data
title_full_unstemmed Sliced-based sufficient dimension reduction for binary imbalanced data
title_sort sliced-based sufficient dimension reduction for binary imbalanced data
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/97bqn3
work_keys_str_mv AT hsuweitse slicedbasedsufficientdimensionreductionforbinaryimbalanceddata
AT xúwéizé slicedbasedsufficientdimensionreductionforbinaryimbalanceddata
AT hsuweitse qièpiànxíngchōngfēnwéidùsuōjiǎnfǎyúèryuánbùpínghéngzīliàozhīyánjiū
AT xúwéizé qièpiànxíngchōngfēnwéidùsuōjiǎnfǎyúèryuánbùpínghéngzīliàozhīyánjiū
_version_ 1719236992699465728