Undersampling bankruptcy prediction: Taiwan bankruptcy data.

Machine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbal...

Full description

Bibliographic Details
Main Authors: Haoming Wang, Xiangdong Liu
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0254030
id doaj-e5768835bfc844bbb9e898bdc224e09b
record_format Article
spelling doaj-e5768835bfc844bbb9e898bdc224e09b2021-07-13T04:31:01ZengPublic Library of Science (PLoS)PLoS ONE1932-62032021-01-01167e025403010.1371/journal.pone.0254030Undersampling bankruptcy prediction: Taiwan bankruptcy data.Haoming WangXiangdong LiuMachine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbalance problem, but little attention has been paid to the effect of the undersampling technology. Therefore, a framework is used to spot-check algorithms quickly and combine which undersampling method and classification model performs best. The results show that Naive Bayes (NB) after Edited Nearest Neighbors (ENN) has the best performance, with an F2-measure of 0.423. In addition, by changing the undersampling rate of the cluster centroid-based method, we find that the performance of the Linear Discriminant Analysis (LDA) and Naive Bayes (NB) are affected by the undersampling rate. Neither of them is uniformly declining, and LDA has higher performance when the undersampling rate is 30%. This study accordingly provides another perspective and a guide for future design.https://doi.org/10.1371/journal.pone.0254030
collection DOAJ
language English
format Article
sources DOAJ
author Haoming Wang
Xiangdong Liu
spellingShingle Haoming Wang
Xiangdong Liu
Undersampling bankruptcy prediction: Taiwan bankruptcy data.
PLoS ONE
author_facet Haoming Wang
Xiangdong Liu
author_sort Haoming Wang
title Undersampling bankruptcy prediction: Taiwan bankruptcy data.
title_short Undersampling bankruptcy prediction: Taiwan bankruptcy data.
title_full Undersampling bankruptcy prediction: Taiwan bankruptcy data.
title_fullStr Undersampling bankruptcy prediction: Taiwan bankruptcy data.
title_full_unstemmed Undersampling bankruptcy prediction: Taiwan bankruptcy data.
title_sort undersampling bankruptcy prediction: taiwan bankruptcy data.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2021-01-01
description Machine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbalance problem, but little attention has been paid to the effect of the undersampling technology. Therefore, a framework is used to spot-check algorithms quickly and combine which undersampling method and classification model performs best. The results show that Naive Bayes (NB) after Edited Nearest Neighbors (ENN) has the best performance, with an F2-measure of 0.423. In addition, by changing the undersampling rate of the cluster centroid-based method, we find that the performance of the Linear Discriminant Analysis (LDA) and Naive Bayes (NB) are affected by the undersampling rate. Neither of them is uniformly declining, and LDA has higher performance when the undersampling rate is 30%. This study accordingly provides another perspective and a guide for future design.
url https://doi.org/10.1371/journal.pone.0254030
work_keys_str_mv AT haomingwang undersamplingbankruptcypredictiontaiwanbankruptcydata
AT xiangdongliu undersamplingbankruptcypredictiontaiwanbankruptcydata
_version_ 1721306408423522304