A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest

<b>Background:</b> the credit scoring model is an effective tool for banks and other financial institutions to distinguish potential default borrowers. The credit scoring model represented by machine learning methods such as deep learning performs well in terms of the accuracy of default...

Full description

Bibliographic Details
Main Authors: Gang Li, Hong-Dong Ma, Rong-Yue Liu, Meng-Di Shen, Ke-Xin Zhang
Format: Article
Language:English
Published: MDPI AG 2021-05-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/23/5/582
id doaj-d051401f3c4f4859a7d4ba76e9c152bc
record_format Article
spelling doaj-d051401f3c4f4859a7d4ba76e9c152bc2021-05-31T23:28:44ZengMDPI AGEntropy1099-43002021-05-012358258210.3390/e23050582A Two-Stage Hybrid Default Discriminant Model Based on Deep ForestGang Li0Hong-Dong Ma1Rong-Yue Liu2Meng-Di Shen3Ke-Xin Zhang4School of Business Administration, Northeastern University, Shenyang 110819, ChinaSchool of Business Administration, Northeastern University, Shenyang 110819, ChinaSchool of Business Administration, Northeastern University, Shenyang 110819, ChinaSchool of Business Administration, Northeastern University, Shenyang 110819, ChinaSchool of Business Administration, Northeastern University, Shenyang 110819, China<b>Background:</b> the credit scoring model is an effective tool for banks and other financial institutions to distinguish potential default borrowers. The credit scoring model represented by machine learning methods such as deep learning performs well in terms of the accuracy of default discrimination, but the model itself also has many shortcomings such as many hyperparameters and large dependence on big data. There is still a lot of room to improve its interpretability and robustness. <b>Methods:</b> the deep forest or multi-Grained Cascade Forest (gcForest) is a decision tree depth model based on the random forest algorithm. Using multidimensional scanning and cascading processing, gcForest can effectively identify and process high-dimensional feature information. At the same time, gcForest has fewer hyperparameters and has strong robustness. So, this paper constructs a two-stage hybrid default discrimination model based on multiple feature selection methods and gcForest algorithm, and at the same time, it optimizes the parameters for the lowest type II error as the first principle, and the highest AUC and accuracy as the second and third principles. GcForest can not only reflect the advantages of traditional statistical models in terms of interpretability and robustness but also take into account the advantages of deep learning models in terms of accuracy. <b>Results:</b> the validity of the hybrid default discrimination model is verified by three real open credit data sets of Australian, Japanese, and German in the UCI database. <b>Conclusions:</b> the performance of the gcForest is better than the current popular single classifiers such as ANN, and the common ensemble classifiers such as LightGBM, and CNNs in type II error, AUC, and accuracy. Besides, in comparison with other similar research results, the robustness and effectiveness of this model are further verified.https://www.mdpi.com/1099-4300/23/5/582default discriminationfeature selectiondeep forestcredit scorecredit loan
collection DOAJ
language English
format Article
sources DOAJ
author Gang Li
Hong-Dong Ma
Rong-Yue Liu
Meng-Di Shen
Ke-Xin Zhang
spellingShingle Gang Li
Hong-Dong Ma
Rong-Yue Liu
Meng-Di Shen
Ke-Xin Zhang
A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest
Entropy
default discrimination
feature selection
deep forest
credit score
credit loan
author_facet Gang Li
Hong-Dong Ma
Rong-Yue Liu
Meng-Di Shen
Ke-Xin Zhang
author_sort Gang Li
title A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest
title_short A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest
title_full A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest
title_fullStr A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest
title_full_unstemmed A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest
title_sort two-stage hybrid default discriminant model based on deep forest
publisher MDPI AG
series Entropy
issn 1099-4300
publishDate 2021-05-01
description <b>Background:</b> the credit scoring model is an effective tool for banks and other financial institutions to distinguish potential default borrowers. The credit scoring model represented by machine learning methods such as deep learning performs well in terms of the accuracy of default discrimination, but the model itself also has many shortcomings such as many hyperparameters and large dependence on big data. There is still a lot of room to improve its interpretability and robustness. <b>Methods:</b> the deep forest or multi-Grained Cascade Forest (gcForest) is a decision tree depth model based on the random forest algorithm. Using multidimensional scanning and cascading processing, gcForest can effectively identify and process high-dimensional feature information. At the same time, gcForest has fewer hyperparameters and has strong robustness. So, this paper constructs a two-stage hybrid default discrimination model based on multiple feature selection methods and gcForest algorithm, and at the same time, it optimizes the parameters for the lowest type II error as the first principle, and the highest AUC and accuracy as the second and third principles. GcForest can not only reflect the advantages of traditional statistical models in terms of interpretability and robustness but also take into account the advantages of deep learning models in terms of accuracy. <b>Results:</b> the validity of the hybrid default discrimination model is verified by three real open credit data sets of Australian, Japanese, and German in the UCI database. <b>Conclusions:</b> the performance of the gcForest is better than the current popular single classifiers such as ANN, and the common ensemble classifiers such as LightGBM, and CNNs in type II error, AUC, and accuracy. Besides, in comparison with other similar research results, the robustness and effectiveness of this model are further verified.
topic default discrimination
feature selection
deep forest
credit score
credit loan
url https://www.mdpi.com/1099-4300/23/5/582
work_keys_str_mv AT gangli atwostagehybriddefaultdiscriminantmodelbasedondeepforest
AT hongdongma atwostagehybriddefaultdiscriminantmodelbasedondeepforest
AT rongyueliu atwostagehybriddefaultdiscriminantmodelbasedondeepforest
AT mengdishen atwostagehybriddefaultdiscriminantmodelbasedondeepforest
AT kexinzhang atwostagehybriddefaultdiscriminantmodelbasedondeepforest
AT gangli twostagehybriddefaultdiscriminantmodelbasedondeepforest
AT hongdongma twostagehybriddefaultdiscriminantmodelbasedondeepforest
AT rongyueliu twostagehybriddefaultdiscriminantmodelbasedondeepforest
AT mengdishen twostagehybriddefaultdiscriminantmodelbasedondeepforest
AT kexinzhang twostagehybriddefaultdiscriminantmodelbasedondeepforest
_version_ 1721417441218658304