Sliced-based sufficient dimension reduction for binary imbalanced data
碩士 === 國立臺北大學 === 統計學系 === 107 === to high-dimensional data to find the effective DR directions benefit users to explore the intrinsic structure of high-dimensional data in the low-dimensional subspace. The dimension-reduced data could be regarded as the features of the raw data, and can further be...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/handle/97bqn3 |
id |
ndltd-TW-107NTPU0337036 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107NTPU03370362019-08-24T03:36:45Z http://ndltd.ncl.edu.tw/handle/97bqn3 Sliced-based sufficient dimension reduction for binary imbalanced data 切片型充分維度縮減法於二元不平衡資料之研究 HSU, WEI-TSE 徐維澤 碩士 國立臺北大學 統計學系 107 to high-dimensional data to find the effective DR directions benefit users to explore the intrinsic structure of high-dimensional data in the low-dimensional subspace. The dimension-reduced data could be regarded as the features of the raw data, and can further be employed in the classification and/or clustering problems. It has been shown that applying binary classification rules to the imbalanced data would cause prediction bias. The so-called imbalanced data is a dataset whose numbers of observations in two categories of a response variable are significantly different. However, even though many researches have been conducted to study the effects of the imbalanced data to the classification rules, there has been very little study reported on the applications of SDR to the imbalanced data in the literature. Therefore, in this study, we are motivated to investigate the effects of the binary imbalanced data to four SDR methods including Sliced Inverse Regression (SIR), Sliced Average Variance Estimation (SAVE), Difference of Covariance (DOC), and principal Hessian direction (pHd). The performance of the selected SDR methods is evaluated by the simulation studies and a real data analysis with or without pre-balancing process. The results of these numerical experiments show that the pre-balancing process is needed for SIR when the imbalanced data is consists of two similar group means and a smaller variance of the positive class. For SAVE, the prebalancing process is optional even the bias of the DR estimates of SAVE would be larger than SIR. As for DOC and pHd, the performance is worse than those of SIR and SAVE, which suggests that DOC and pHd are not suitable for the imbalanced data. WU, HAN-MING 吳漢銘 2019 學位論文 ; thesis 31 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺北大學 === 統計學系 === 107 === to high-dimensional data to find the effective DR directions benefit users to explore the intrinsic structure of high-dimensional data in the low-dimensional subspace. The dimension-reduced data could be regarded as the features of the raw data, and can further be employed in the classification and/or clustering problems. It has been shown that applying binary classification rules to the imbalanced data would cause prediction bias. The so-called imbalanced data is a dataset whose numbers of observations in two categories of a response variable are significantly different. However, even though many researches have been conducted to study the effects of the imbalanced data to the classification
rules, there has been very little study reported on the applications of SDR to
the imbalanced data in the literature. Therefore, in this study, we are motivated to investigate the effects of the binary imbalanced data to four SDR methods including Sliced Inverse Regression (SIR), Sliced Average Variance Estimation (SAVE), Difference of Covariance (DOC), and principal Hessian direction (pHd). The performance of the selected SDR methods is evaluated by the simulation studies and a real data analysis with or without pre-balancing process. The results of these numerical experiments show that the pre-balancing process is needed for SIR when the imbalanced data is consists of two similar group means and a smaller variance of the positive class. For SAVE, the prebalancing process is optional even the bias of the DR estimates of SAVE would be larger than SIR. As for DOC and pHd, the performance is worse than those of SIR and SAVE, which suggests that DOC and pHd are not suitable for the imbalanced data.
|
author2 |
WU, HAN-MING |
author_facet |
WU, HAN-MING HSU, WEI-TSE 徐維澤 |
author |
HSU, WEI-TSE 徐維澤 |
spellingShingle |
HSU, WEI-TSE 徐維澤 Sliced-based sufficient dimension reduction for binary imbalanced data |
author_sort |
HSU, WEI-TSE |
title |
Sliced-based sufficient dimension reduction for binary imbalanced data |
title_short |
Sliced-based sufficient dimension reduction for binary imbalanced data |
title_full |
Sliced-based sufficient dimension reduction for binary imbalanced data |
title_fullStr |
Sliced-based sufficient dimension reduction for binary imbalanced data |
title_full_unstemmed |
Sliced-based sufficient dimension reduction for binary imbalanced data |
title_sort |
sliced-based sufficient dimension reduction for binary imbalanced data |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/handle/97bqn3 |
work_keys_str_mv |
AT hsuweitse slicedbasedsufficientdimensionreductionforbinaryimbalanceddata AT xúwéizé slicedbasedsufficientdimensionreductionforbinaryimbalanceddata AT hsuweitse qièpiànxíngchōngfēnwéidùsuōjiǎnfǎyúèryuánbùpínghéngzīliàozhīyánjiū AT xúwéizé qièpiànxíngchōngfēnwéidùsuōjiǎnfǎyúèryuánbùpínghéngzīliàozhīyánjiū |
_version_ |
1719236992699465728 |