Monday, September 14, 2015

Predictive Lead Scoring for a Bank

Introduction
 I recently participated in a hackathon hosted by www.analyticsvidhya.com. I finished 6th on the Private Leader board amongst a very competitive crowd. This is an excellent example of a “Predictive Lead Scoring” problem faced by businesses in multiple sectors including Banking, Insurance, Financial Services, Retail, Manufacturing & FMCG. Hence I decided to cover my approach to solve this business problem in this post.

Problem Description

Customer Bank is a mid-sized private bank which deals in all kinds of loans, having branches across all major cities in the country.
Digital arms of banks today face challenges with lead conversion, they source leads through mediums like search, display, email campaigns and via affiliate partners. Here Customer Bank faces same challenge of low conversion ratio.
The challenge is to identify the customers segments having higher propensity to opt for a specific loan product.In order to maximize ROI from Marketing spends the bank needs to target only those customers who are most likely to go for a loan product.

Data

Customer details based on the last 3 months transactions. We were charged with identifying the segment of customers having higher disbursal rate in next 30 days.

Input variables:

ID - Unique ID
Gender- Sex
City - Current City
Monthly_Income - Monthly Income in rupees
DOB - Date of Birth
Lead_Creation_Date - Lead Created on date
Loan_Amount_Applied - Loan Amount Requested
Loan_Tenure_Applied - Loan Tenure Requested (in years)
Existing_EMI - EMI of Existing Loans
Employer_Name - Employer Name
Salary_Account- Salary account with Bank
Mobile_Verified - Mobile Verified (Y/N)
Var5- Continuous classified variable
Var1- Categorical variable with multiple levels
Loan_Amount_Submitted- Loan Amount Revised and Selected after seeing Eligibility
Loan_Tenure_Submitted- Loan Tenure Revised and Selected after seeing Eligibility (Years)
Interest_Rate- Interest Rate of Submitted Loan Amount
Processing_Fee- Processing Fee of Submitted Loan Amount
EMI_Loan_Submitted- EMI of Submitted Loan Amount
Filled_Form- Filled Application form post quote
Device_Type- Device from which application was made (Browser/ Mobile)
Var2- Categorical Variable with multiple Levels
Source- Categorical Variable with multiple Levels
Var4- Categorical Variable with multiple Levels

Target Variables:

LoggedIn- Application Logged
Disbursed- Loan Disbursed

My Approach:

I applied the “Predictive Lead Scoring” methodology discussed in my earlier blog posts, to this problem. I used “R”, the lingua franca of the “Predictive Analytics” applications world to solve this problem.
Step 1: Feature Engineering
  • converted DOB into Age in years.
  • converted character categorical variables into numeric levels
  • imputed missing values and Nas with defaults
  • grouped “outliers” into a single category ,so that they wont skew the predictions
Step 2: Exploratory Data Analysis
  • identified variables having high correlation with the target variable “Disbursed”.
  • These variables would be included in the Predictive Modeling process.
  • dropped the variables having no correlation with the with the target variable “Disbursed”.
  • normalized the variables so that all variables are of similar numerical order.
Step 3: Predictive Modeling
  • I started with “generalized linear models” achieving a decent conversion rate of 75%
  • This helped in identifying the relative importance of variables in the prediction process
  • I then applied the insights gained to advanced modeling techniques like “random forest” and “boosted trees”.
  • With a fair amount of parameter tuning and ensembling techniques I reached a conversion rate of ~85%.

Conclusion:

This model can be used to predict the customers’ propensities to opt for the specific product or service. The marketing efforts can then target only the “top” 10-15% customers based on their propensities.
Thus a business can simultaneously lower their marketing spend and increase the lead conversion rates leading to a higher ROI on Marketing Spend.

Monday, August 24, 2015

Predictive Lead Scoring using R (Part 2/2)

Introduction:

In part 1 of this blog series we saw what is Predictive Lead Scoring. And we introduced R, which is the most popular free software for Statistical Computing and Graphical Visualization including Predictive Modeling. In this concluding part 2 of the series we shall see what is the “Predictive Lead Scoring Life Cycle”. We shall also see how we can harness the rich collection of libraries in R to optimize the Predictive Lead Scoring process.


Predictive Lead Scoring Life Cycle

Following infographic illustrates the “Predictive Lead Scoring Life Cycle” as we understand it

Phase 1:

The various disparate sources of customer and leads data such as ERP, CRM, etc. are identified.

Phase 2:

Marketing Automation tools extract leads data from the /ERPCRM and embellish it with “Market Intelligence” viz. Inputs from web-sites and Social media.

Phase 3:

Data Integration tools are used to standardize and  integrate data from disparate sources  like ERP, CRM and Market Automation tools.
The integrated lead/customer data includes attributes such as,
Demographic/Explicit
  • Geographical location (country, politics, economy, etc.)
  • Industry
  • Company financials
  • job title
Implicit
  • website visits
  • content downloads
  • webinar attendance
  • form completions etc

Phase 4:

Predictive Lead Scoring Applications use various machine learning algorithms like logistic regression, recursive partitioning trees, neural networks and random forests to first identify the predictive attributes of the leads and then assign each lead with a propensity/probability to convert into a Prospect.
Here we would like to highlight 2 distinct advantages of “Predictive” Lead Scoring.
  • Predictive Lead Scores are based on the statistical relationship between numerous attributes (customers’ behaviour) and outcomes (Very Hot, Hot, warm or cold lead).
  • Predictive analytic algorithms discover data associations which may not be immediately obvious to even experienced sales people.
This improves the accuracy of the lead scores. Now the Sales team can concentrate their efforts and resources on the “Hot Leads” Vs. The warm/cold leads. Accurate scores would lead to higher conversion rates.

Phase 5:

Insights gained are presented to the top decision makers  in the form of BI dashboards.
Moreover the insights gained are also ploughed back into the CRM system, further enhancing the customer profiles.
This feedback loop results in highly precise (lead score) predictions over a period of time / number of sales cycles, in a cost efficient manner.
The R statistical programming language provides a rich set of libraries for Predictive Analytics, including the following, used in Predictive lead scoring:
  1. lm or Logistic Regression algorithm is used for basic predictive lead scoring.
  2. rpart or recursive partitioning trees & converting trees to rules
  3. randomForest to identify predictive attributes of leads (varImp)
  4. The Party package offers conditional inference trees, unbiased random forest variable importance and model based trees.
  5. neuralnet or Neural Networks
  6. bnlearn or Bayesian Networks to understand causal relationships between scoring parameters
If you would like to harness the power of “Predictive Lead Scoring using R” to improve your Sales leads conversion rates, please feel free get in touch with me.

Tuesday, January 27, 2015

Predictive Lead Scoring using R (Part 1/2)

Predictive Lead Scoring using R (Part 1/2)

Introduction:
In order to understand, Predictive Lead Scoring, one must first and foremost be familiar with the term ‘Marketing Automation’, referred to as ‘the new kid on the block’ by some. In the good old days, Marketing was synonymous to ‘Advertising’ and ‘Expensive Marketing Campaigns’ or ‘Outbound marketing’. And the Sales team was left to cold calling leads, which resulted in very low conversion rates. Then somebody high up there in the marketing field thought what if we could get the customers to come to us “Inbound marketing” rather than us reaching out to them, ‘Outbound Marketing’. And ‘Marketing Automation’ was born.

What is Marketing Automation and Lead Scoring?

According to Wikipepedia.com “Marketing automation” refers to software platforms and technologies designed for marketing departments and organizations to more effectively market on multiple channels online (such as email, social media, websites, etc.) and automate repetitive tasks.
Following info graphic summarizes typical Marketing Automation Tasks and Platforms.





Marketing automation has a spotlight on the task of converting leads from the top of the marketing funnel by nurturing them into sales-ready ‘Prospects’, to the point of handshake where the Sales process begins. And Lead Scoring is a method used to assess the probability of Leads converting into Prospects. Traditionally sales guys assigned points to actions performed by Leads e.g. clicking on a web-site link, emailing, etc., in order to come up with a ‘Lead Score’.

Predictive Lead Scoring takes ‘Lead Scoring’ to the next dimension, by capitalizing on the huge data deluge (Big Data) generated by customers/leads and applying Statistical Algorithms to this data as illustrated in the following info graphic. R is the most popular open source language used in Predictive Analytics Applications.  Joseph Sirosh  Corporate Vice President, Machine Learning, Microsoft, recently announced that Microsoft has reached an agreement to acquire Revolution Analytics. Revolution Analytics is the leading commercial provider of software and services for R, the world’s most widely used programming language for statistical computing and predictive analytics. 

 

Thank you for reading this post. In the next post I shall come up with a case study for Predictive Lead Scoring using R.