Credit Risk Evaluation using Machine Learning

In this thesis, we examine the machine learning models logistic regression, multilayer perceptron and random forests in the purpose of discriminate between good and bad credit applicants. In addition to these models we address the problem of imbalanced data with the Synthetic Minority Over-Sampling...

Full description

Bibliographic Details
Main Author: Sandberg, Martina
Format: Others
Language:English
Published: Linköpings universitet, Statistik och maskininlärning 2017
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-138968
Description
Summary:In this thesis, we examine the machine learning models logistic regression, multilayer perceptron and random forests in the purpose of discriminate between good and bad credit applicants. In addition to these models we address the problem of imbalanced data with the Synthetic Minority Over-Sampling Technique (SMOTE). The data available have 273 286 entries and contains information about the invoice of the applicant and the credit decision process as well as information about the applicant. The data was collected during the period 2015-2017. With AUC-values at about 73%some patterns are found that can discriminate between customers that are likely to pay their invoice and customers that are not. However, the more advanced models only performed slightly better than the logistic regression.