Date post: | 15-Apr-2017 |
Category: |
Data & Analytics |
Upload: | varsha-holennavar |
View: | 21 times |
Download: | 4 times |
Assignment-3Group-101. Pooja Goyal2. Shashwat Mehra3. Varsha Holennavar
Lending Club Data AnalysisLending Club (LC) data, LC is a peer-to-peer online lending platform. It is the world’s largest marketplace connecting borrowers and investors, where consumers and small business owners lower the cost of their credit and enjoy a better experience than traditional bank lending, and investors earn attractive risk-adjusted returns.
Project ObjectivePredict if lenders can
make default payment for the borrowed loan
Predict Interest Rate to be charged on the
loan amount
Predict if the loan will be approved for an interest rate of 10%
or below
End Users : Borrowers And Lenders
Data Exploration
For each loan, over 100 characteristics are recorded in the table.
We have explored Data Dictionary from the Lending Club website, which gives us the information about the features in the dataset. We explored the dataset using r and Tableau to understand and find correlations between different features.
Data Pre-ProcessingWe are selecting 31 columns from 115 columns available based on the data exploration and feature co-relation methods.
Removing NA’s
Removing Wildcards
Removing Outliers
Creating Calculated Fields• Fico Mean• Indicator• Monthly Income
Models:Lo
an st
atus • Logistic
Regression• Neural Network• Random Forest Lo
an
Appr
oval• Logistic
• Neural Network• Random Forest
Inte
rest
Rat
e • Linear Regression
• Neural Network• Boosted
Decision Tree
Ex: Loan Status Model
Model Evaluation for Loan Status• We have compared over all accuracy, recall, precision, ROC
curve and confusion matrix• If this model is to help lenders avoid bad loans, the true positive
rate must be much more robust
Neural Network
Logistic Regression
Random forest
Accuracy 0.914629 0.910 0.9006Precision 0.914629 0.935 0.9006Recall 0.914629 0.957 0.9006
Model Evaluation for Interest Rate
Model Name / Features
Neural Network Linear Regression Boosted Decision Tree
RMSE 1.50 1.79 1.20Co-efficient of Determination
0.83 0.76 0.89
• We have compared over all RMSE and Co-efficient of Determination.
Model Evaluation for Loan Approval • We have compared over all accuracy, recall, precision, ROC
curve and confusion matrix
Neural Network
Logistic Regression
Random forest
Accuracy 0.8194 0.8410 0.80Precision 0.8194 0.8410 0.780Recall 0.8194 0.8410 0.822
Approach for Deployment
Tableau
Sentiment AnalysisWe collected
tweets for lending club from Twitter
Incorporated our Research project
to detect Sentiments of
Tweets
Used Tableau for visualization of
Results
Incorporated the visualizations on
front End
Demo
Team AssessmentContribution
Pooja Goyal Shashwat Mehra Varsha Holennavar
Thank You