A Gentle Introduction to Data Science for Credit Risk Modeling — Part 1

We will try to beat the loan grades assigned by LC, by creating a new machine learning model from scratch.Lending Club Data: An OutlookLending Club was one of the first companies to create an online marketplace for P2P Lending back in 2006.It gained substantial traction in the wake of the 2008 crisis — partly due to changes in traditional banks’ capability and willingness to lend back then.In this post, we will use data science and exploratory data analysis to take a peek Lending Club’s loan data from 2007 to 2015, focusing on the following questions regarding this period:Loan Absolute Variables Distribution: How does loan value, amount funded by lender and total committed by investors distribution looks like?Applications Volume: How many loan applications were received?Defaults Volume: How many loans were defaulted?Average Interest Rates: What was the average interest rate?Loan Purpose: What were the most frequent Loan Purposes?Loan Grades: How worthy are the loans?Delinquency Breakdown: How many loans were Charged Off?By analyzing these aspects, we will be able to understand our data better and also get to know a bit of Lending Club’s story.The dataset contains 887K loan applications from 2007 through 2015 and it can be downloaded from Kaggle.Loan Absolute Variables DistributionIn the dataset we have three absolute variables relating to the loans: loan amount, amount funded and total committed by investors.These variables are similarly distributed, which shows that there is an adequate balance between funding and credit.Applications VolumeBelow we can see an evolution of Loan Applications that were issued YoY — year over year basis from 2007 to 2015.We can see that loan applications have raised steadily from mid 2007 until 2014.There are definitely two distinct graph patterns considering pre-2014 and post-2014.From early 2014 until mid 2014, we can see a boom in loan applications volume, only to see a violent drop after this period, with the pattern repeating itself.What could be reasons for that?IPOs are the holy grail for every startup (or at least, for its investors)..From this period until the end of 2014, the default growth pace has pretty much followed loan application volume.By late 2015, defaulted loans volume reached 2013 levels.In the plot below we can see a comparison between the Loan Applications and Loan Defaults in log scale.Take a look at the period between mid 2014 and early 2015.As of 2015, looks like Lending Club was going through a perfect storm — loan applications were rising and loan defaults were diminishing.Loan PurposeWhat were the most frequent Loan Purposes?Debt Consolidation stands as clear winner for loan purpose, with more than 500K loans — or 58% from the total.Other highlights include:Credit Card — more than 200K (~20%)Home Improvement — more than 50K (~8%)Other Purposes — less than 50K (~3%)Average Interest RatesWhat was Lending Club’s the average interest rate between 2007 and 2015?Loan Interest Rates have followed an interest pattern over these years..Which is a good sign for Lending Club.But, are these the right grades?Let’s zoom into the loan subgrades for delinquent loans and find out.Delinquency BreakdownFrom the entire loan population, we have 67K delinquent loans (~7.5%).Let’s zoom in a bit into the delinquent loans by analyzing their Grades and Subgrades.Looking at 67K delinquent loans, we have the following highlights:2.5K (~4%) loans with an Grade A were charged off9.5K (~14%) loans with Grade B were charged off12.8K (~20%) loans with Grade C were charged offIntuitively, we would expect grades worse than C to be the worse payers than A, B and C — something that doesn’t quite happen here.. More details

Leave a Reply