+ All Categories
Home > Data & Analytics > Credit risk predictive analytics

Credit risk predictive analytics

Date post: 09-Aug-2015
Category:
Upload: datasciencesociety
View: 382 times
Download: 1 times
Share this document with a friend
Popular Tags:
14
Challenges for Credit Risk Predictive Analytics in Bulgaria Vladimir Labov, FRM
Transcript
Page 1: Credit risk predictive analytics

Challenges for Credit Risk Predictive Analytics in Bulgaria

Vladimir Labov, FRM

Page 2: Credit risk predictive analytics

Agenda• The idea behind • Solutions to practical problems in credit risk analytics:• Outliers• Missing values• Logical coefficient signs• Binning (grouping)• Categorical variables• Multicollinearity• Applicants with high income and high indebtedtness• Unofficial income• Current place of residence verification

11.02.2015 Vladimir Labov, Data Science Society 2

Page 3: Credit risk predictive analytics

The Idea Behind• Predictive analytics for credit risk tries to answer the question - is a

borrower going to give the money back?• Why is it important? – makes sure most of your money in a bank

goes to the right people• Since this problem boils down to predicting whether the customer is

good or bad, statistical classification algorithms provide the best solution.

• In practice, credit risk is quantified by estimating the expected lossfrom each borrower.

• This is why methods that produce a probability of default between 0 and 1 are preferred, for example logistic regression.

• Two types of predictive models for credit risk - application and behavioural scorecards. Application scorecards are more important, because the actual lending decision depends on them.

11.02.2015 Vladimir Labov, Data Science Society 3

Page 4: Credit risk predictive analytics

The Problem of Outliers• Problem – outliers distort estimates of regression coefficients• Classical solution – trim them, use robust regression or quantile regression• Elegant solution – transform the variables to weight of evidence or default

rates• Weight of evidence calculation:

• Interpretation:- positive values: share of goods > share of bads- negative values: share of goods < share of bads- zero: share of goods = share of bads

• Advantage: no distortion of estimates; extreme values both in the estimation sample and the holdout sample fall into the marginal WoEgroups

11.02.2015 Vladimir Labov, Data Science Society 4

100*)]//

[ln(badsallgroupinbadsgoodsallgroupingoods

WoE

Page 5: Credit risk predictive analytics

The Problem of Missing Values• Problem – missing values for a variable make the whole observation

useless• Classical solution – trim them, use the mean value or multiple

imputation• Elegant solutions:

- missing age or gender can be inferred from the ID number (ЕГН in Bulgaria)- if missing values are few: assign them to the group with the closest default rate, or to the most logical group (to the lowest income group, lowest years of employment history group, etc.)- transform the variables to weight of evidence or default rates

11.02.2015 Vladimir Labov, Data Science Society 5

Page 6: Credit risk predictive analytics

The Problem of Logical Coefficient Signs

• Problem – certain variables may have a significant p-value, but with a coefficient sign that defies economical logic

• Solutions:- Estimate univariate regressions on each variable to help you get a feel what the logical sign should be- The no-brainer: just use weight-of-evidence, all variables should have a negative coefficient sign, example why:

11.02.2015 Vladimir Labov, Data Science Society 6

coeff WoEvalue

Z pd

-1 -1 1 0.73

-1 -2 2 0.88

+1 -1 -1 0.26

+1 -2 -2 0.11

))exp(1/(111

Zpd

WoEcoeffaxbaZ i

n

iii

n

ii

Page 7: Credit risk predictive analytics

The Problem of Binning• Problem – how to determine the optimal groups for WoE

transformation of numerical variables?• Solution:

- split every variable into 10 deciles- observe if the average default rate for the deciles changes in a logical fashion (e.g. monotonically if this is the expected relationship)- combine groups in which the average default rate is close enough- adjust the cut-off points for the groups whose default rate is out of line with the adjacent groups- if in doubt, use the Information Value (IV) criterion to compare two binnings: the binning with the higher Information Value differentiates better between the distribution of goods and the distribution of bads, the ultimate goal in scorecard development

11.02.2015 Vladimir Labov, Data Science Society 7

Page 8: Credit risk predictive analytics

The Problem of Categorical Variables• Problem – How to represent categorical variables?• Classical Solution: use dummy variables, but:

- often some categories turn out insignificant- difficult to interpret the overall significance of a variable split to 5 dummy regressors

• Elegant Solution:- again assign a WoE value to each category

11.02.2015 Vladimir Labov, Data Science Society 8

Page 9: Credit risk predictive analytics

The Problem of Multicollinearity• Problem – Correlated variables distort the coefficient signs or make

individually significant variables insignificant in multivariable context• Classical Solution: drop the variable with the wrong sign• Elegant Solution:

- combine the correlated variables into a new variable- example: income & source of income

11.02.2015 Vladimir Labov, Data Science Society 9

Page 10: Credit risk predictive analytics

Applicants with High Income and High Indebtedness

• Problem – high income is an indicator of lower risk, but at the same time individuals with high income may face difficulties if their debt level is high as well

• Solution: take the income net of debt payments (disposable income)

11.02.2015 Vladimir Labov, Data Science Society 10

Page 11: Credit risk predictive analytics

The Problem of Unofficial Income• Problem – some applicants get paid a salary higher than the officially

declared one that cannot be verified by a NSSI (National Social Security Institute) check

• Solution:- request a declaration from the employer for the real salary- take the max(verified by a NSSI check; verified by declaration check)

11.02.2015 Vladimir Labov, Data Science Society 11

Page 12: Credit risk predictive analytics

The Problem of Current Residence• Problem – a lot of people in Bulgaria do not update their current

and/or permanent address, while the place of residence is a somewhat important demographical factor

• Solution:- take the branch where the application was submitted as the current place of residence

11.02.2015 Vladimir Labov, Data Science Society 12

Page 13: Credit risk predictive analytics

An Astrological Detour • Some people believe you can tell a lot about the character of a

person based on their astrological sign. So can you predict whether they are reliable borrowers from it?

• A regression of default on the astrological sign in our consumer loan database had a Gini coefficient of 10.25% (AUROC of 0.55) – lower than most of the variables that made it into the final model. So sorry, ladies, but astrology can’t tell you everything about a person

• For you astrology aficionados that still want to believe, people born under Leo are the most risky, while people under Capricorn are the most reliable payers

11.02.2015 Vladimir Labov, Data Science Society 13

Page 14: Credit risk predictive analytics

THANK YOU!QUESTIONS TIME!

11.02.2015 Vladimir Labov, Data Science Society 14


Recommended