Loan Default Prediction

Amarjeet Cheema and Miriam McGaugh
Oklahoma State University


Abstract

Interest on loans and associated fees are the biggest revenue sources for most of the banks and credit unions. More than 44 million borrowers collectively owe about $3.5 trillion in total outstanding consumer credit as of October 2015. However, more than 1 million people default on loans each year. A report from The Urban Institute, a non-profit research institute, found that nearly 40% of borrowers are expected to default on their student loans by 2023. Considering the magnitude of risk and financial loss involved, it is essential for banks to give loans to credible applicants who are highly likely to pay back the loan amount. The objective of my project is to assess the likelihood of loan default based on customer demographics and financial data. Furthermore, as an outcome of this project we found the most significant variables that contribute to determining loan default. The project will also categorize loan type based on highest risk of default. Such insights will help the banks in significantly reducing the risk of losing money associated with loans. The original data-set had 887,000 observations and 24 columns, however, it was filtered to only those rows of data for which the loan status was either fully paid or default. The resulting data-set had 209,000 rows with 24 variables. SAS Enterprise Miner was used to create logistic regression and decision trees models with different configurations and SAS Viya was used for data visualization.