Using Machine Learning Schemes to Predict Default Risk on Imbalanced Peer-to-Peer Lending Dataset

碩士 === 國立臺灣科技大學 === 電子工程系 === 105 === In the past few years, Peer-to-Peer lending (P2P lending)has grown rapidly in the world. The main idea of P2P lending is disintermediation, removing the intermediaries like banks. For small business and some individuals without enough credit or credit history, P...

Full description

Bibliographic Details
Main Authors: Yan-Ru - Chen, 陳彥汝
Other Authors: Jenq-Shiou Leu
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/55955221697592097003
Description
Summary:碩士 === 國立臺灣科技大學 === 電子工程系 === 105 === In the past few years, Peer-to-Peer lending (P2P lending)has grown rapidly in the world. The main idea of P2P lending is disintermediation, removing the intermediaries like banks. For small business and some individuals without enough credit or credit history, P2P lending is a good way to loan money. However, the fundamental problem of P2P lending is information asymmetry in this model, which may not correctly estimate the default risk of lending. Lenders only determine whether or not to fund the loan by the information provided by borrowers, causing P2P lending data to be an imbalanced dataset, which contains unequal fully paid and default loans. A imbalanced dataset is quite common in real world, such as credit card fraud in transactions, bad products in the plant and so on. The imbalance phenomenon might affect the machine learning schemes, which is used to predict the repayment behavior, to tend to majority class for achieving a high accuracy. However, the characteristic of the minority class is much meaningful in the loaning business.In this thesis, we use several machine learning schemes to predict the default risk of P2P lending, and use re-sampling and cost-sensitive mechanisms to processing imbalanced datasets. Besides, we used the dataset from Lending Club to validate our proposed scheme. The experiment results show that our proposed scheme can effectively raise the prediction accuracy for default risk.