Sliced-based sufficient dimension reduction for binary imbalanced data

碩士 === 國立臺北大學 === 統計學系 === 107 === to high-dimensional data to find the effective DR directions benefit users to explore the intrinsic structure of high-dimensional data in the low-dimensional subspace. The dimension-reduced data could be regarded as the features of the raw data, and can further be...

Full description

Bibliographic Details
Main Authors:	HSU, WEI-TSE, 徐維澤
Other Authors:	WU, HAN-MING
Format:	Others
Language:	zh-TW
Published:	2019
Online Access:	http://ndltd.ncl.edu.tw/handle/97bqn3

id	ndltd-TW-107NTPU0337036
record_format	oai_dc
spelling	ndltd-TW-107NTPU03370362019-08-24T03:36:45Z http://ndltd.ncl.edu.tw/handle/97bqn3 Sliced-based sufficient dimension reduction for binary imbalanced data 切片型充分維度縮減法於二元不平衡資料之研究 HSU, WEI-TSE 徐維澤碩士國立臺北大學統計學系 107 to high-dimensional data to find the effective DR directions benefit users to explore the intrinsic structure of high-dimensional data in the low-dimensional subspace. The dimension-reduced data could be regarded as the features of the raw data, and can further be employed in the classification and/or clustering problems. It has been shown that applying binary classification rules to the imbalanced data would cause prediction bias. The so-called imbalanced data is a dataset whose numbers of observations in two categories of a response variable are significantly different. However, even though many researches have been conducted to study the effects of the imbalanced data to the classification rules, there has been very little study reported on the applications of SDR to the imbalanced data in the literature. Therefore, in this study, we are motivated to investigate the effects of the binary imbalanced data to four SDR methods including Sliced Inverse Regression (SIR), Sliced Average Variance Estimation (SAVE), Difference of Covariance (DOC), and principal Hessian direction (pHd). The performance of the selected SDR methods is evaluated by the simulation studies and a real data analysis with or without pre-balancing process. The results of these numerical experiments show that the pre-balancing process is needed for SIR when the imbalanced data is consists of two similar group means and a smaller variance of the positive class. For SAVE, the prebalancing process is optional even the bias of the DR estimates of SAVE would be larger than SIR. As for DOC and pHd, the performance is worse than those of SIR and SAVE, which suggests that DOC and pHd are not suitable for the imbalanced data. WU, HAN-MING 吳漢銘 2019 學位論文 ; thesis 31 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺北大學 === 統計學系 === 107 === to high-dimensional data to find the effective DR directions benefit users to explore the intrinsic structure of high-dimensional data in the low-dimensional subspace. The dimension-reduced data could be regarded as the features of the raw data, and can further be employed in the classification and/or clustering problems. It has been shown that applying binary classification rules to the imbalanced data would cause prediction bias. The so-called imbalanced data is a dataset whose numbers of observations in two categories of a response variable are significantly different. However, even though many researches have been conducted to study the effects of the imbalanced data to the classification rules, there has been very little study reported on the applications of SDR to the imbalanced data in the literature. Therefore, in this study, we are motivated to investigate the effects of the binary imbalanced data to four SDR methods including Sliced Inverse Regression (SIR), Sliced Average Variance Estimation (SAVE), Difference of Covariance (DOC), and principal Hessian direction (pHd). The performance of the selected SDR methods is evaluated by the simulation studies and a real data analysis with or without pre-balancing process. The results of these numerical experiments show that the pre-balancing process is needed for SIR when the imbalanced data is consists of two similar group means and a smaller variance of the positive class. For SAVE, the prebalancing process is optional even the bias of the DR estimates of SAVE would be larger than SIR. As for DOC and pHd, the performance is worse than those of SIR and SAVE, which suggests that DOC and pHd are not suitable for the imbalanced data.
author2	WU, HAN-MING
author_facet	WU, HAN-MING HSU, WEI-TSE 徐維澤
author	HSU, WEI-TSE 徐維澤
spellingShingle	HSU, WEI-TSE 徐維澤 Sliced-based sufficient dimension reduction for binary imbalanced data
author_sort	HSU, WEI-TSE
title	Sliced-based sufficient dimension reduction for binary imbalanced data
title_short	Sliced-based sufficient dimension reduction for binary imbalanced data
title_full	Sliced-based sufficient dimension reduction for binary imbalanced data
title_fullStr	Sliced-based sufficient dimension reduction for binary imbalanced data
title_full_unstemmed	Sliced-based sufficient dimension reduction for binary imbalanced data
title_sort	sliced-based sufficient dimension reduction for binary imbalanced data
publishDate	2019
url	http://ndltd.ncl.edu.tw/handle/97bqn3
work_keys_str_mv	AT hsuweitse slicedbasedsufficientdimensionreductionforbinaryimbalanceddata AT xúwéizé slicedbasedsufficientdimensionreductionforbinaryimbalanceddata AT hsuweitse qièpiànxíngchōngfēnwéidùsuōjiǎnfǎyúèryuánbùpínghéngzīliàozhīyánjiū AT xúwéizé qièpiànxíngchōngfēnwéidùsuōjiǎnfǎyúèryuánbùpínghéngzīliàozhīyánjiū
_version_	1719236992699465728

Sliced-based sufficient dimension reduction for binary imbalanced data

Similar Items