An event is fast approaching that is the highlight of the year for many members of the SAS community. I am, of course, referring to SAS Global Forum 2012, which this year will be hosted in the Walt Disney World Swan and Dolphin Resort in Orlando, Florida. I am particularly excited as this will be my second time in attendance, having presented at last year’s conference in Las Vegas. Those great memories and experiences are still surprisingly fresh in my mind.
I also have the honour of presenting in the Data Mining and Text Analytics stream on “An experimental comparison of classification techniques for imbalanced credit scoring data sets using SAS® Enterprise Miner™,” Monday, April 23, at 4:30 p.m. in Northern Hemisphere A-3 (so no excuses not to attend).
In this presentation, I will discuss the capabilities of the modelling techniques in SAS Enterprise Miner and how well they perform in the context of scoring imbalanced credit scoring data sets.
Essentially the aim of credit scoring is to classify loan applicants into two classes:
- Good payers — those who are likely to keep up with their repayments.
- Bad payers — those who are likely to default on their loans.
In the current financial climate, and with the introduction of the Basel II Accord, financial institutions have even more incentives to select and implement the most appropriate credit scoring techniques for their credit portfolios. Companies could make significant future savings if an improvement of only a fraction of a percent could be made in the accuracy of credit scoring techniques.
However, in the research literature, portfolios that can be considered as very low risk, or low default portfolios (LDPs), have had relatively little attention paid to them in particular with regards to which techniques are most appropriate for scoring. The underlying problem with LDPs is that they contain a much smaller number of observations in the class of defaulters than in that of the good payers. A large class imbalance is therefore present, which some techniques may not be able to successfully handle.
Typical examples of low default portfolios include high-quality corporate borrowers, banks, sovereigns and some categories of specialised lending, but in some countries even certain retail lending portfolios could turn out to have very low numbers of defaults compared to the majority class. In a recent FSA publication regarding conservative estimation of low default portfolios, regulatory concerns were raised about whether firms can adequately assess the risk of LDPs.
A wide range of classification techniques have already been proposed in the credit scoring literature, including statistical techniques, such as linear discriminant analysis and logistic regression, and non-parametric models, such as k-nearest neighbour and decision trees. It is currently unclear from the literature which technique is the most appropriate for improving discrimination for LDPs.
Therefore, in this presentation, I will explore several modelling techniques in SAS Enterprise Miner that can be used in the analysis of imbalanced credit scoring data sets. Along with looking at traditional classification techniques, I will also explore the suitability of gradient boosting and memory-based reasoning (k-NN) for loan default prediction. All of these techniques will be benchmarked on five real-world credit scoring data sets from European financial institutions.
While at the conference, I am very interested to meet with you and discuss this or any of the other topics I have blogged about. If you see me at the SAS Global Forum, feel free to say hello. I look forward to seeing you there!