Credit Risk Modeling - Technical University of Denmarketd.dtu.dk/thesis/224338/ep08_100.pdf · As...

Credit Risk Modeling

Arnar Ingi Einarsson

Kongens Lyngby 2008

IMM-PHD-2008-100

Technical University of Denmark

Informatics and Mathematical Modelling

Building 321, DK-2800 Kongens Lyngby, Denmark

Phone +45 45253351, Fax +45 45882673

[email protected]

www.imm.dtu.dk

IMM-PHD: ISSN 0909-3192

Summary

The credit assessment made by corporate banks has been evolving in recentyears. Credit assessments have evolved from the being the subjective assessmentof the banks credit experts, to become more mathematically evolved. Banks areincreasingly opening their eyes to the excessive need for comprehensive model-ing of credit risk. The financial crisis of 2008 is certain to further the great needfor good modeling procedures. In this thesis the modeling framework for creditassessment models is constructed. Different modeling procedures are tried, lead-ing to the assumption that logistic regression is the most suitable framework forcredit rating models. Analyzing the performance of different link functions forthe logistic regression, lead to the assumption that the complementary log-loglink is most suitable for modeling the default event.

Validation of credit rating models lacks a single numeric measure that concludesthe model performance. A solution to this problem is suggested by using prin-cipal component representatives of few discriminatory power indicators. With asingle measure of model performance model development becomes a much moreefficient process. The same goes for variable selection. The data used in themodeling process are not extensive as would be the case for many banks. Anresampling process is introduced that is useful in getting stable estimates ofmodel performance for a relatively small dataset.

Preface

This thesis was prepared at Informatics Mathematical Modelling, the TechnicalUniversity of Denmark in partial fulfillment of the requirements for acquiringthe Master of Science in Engineering.

The project was carried out in the period from October 1st 2007 to October 1st2008.

The subject of the thesis is the statistical aspect of credit risk modeling.

Lyngby, October 2008

Arnar Ingi Einarsson

Acknowledgements

I thank my supervisors Professor Henrik Madsen and Jesper Colliander Kris-tensen for their guidance throughout this project.

I would also like to thank my family, my girlfriend Hrund for her moral support,my older son Halli for his patience and my new-born son Almar for his inspirationand for allowing me some sleep.

Contents

Summary i

Preface iii

Acknowledgements v

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Aim of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Credit Modeling Framework 5

2.1 Definition of Credit Concepts . . . . . . . . . . . . . . . . . . . . 5

2.2 Subprime Mortgage Crisis . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Development Process of Credit Rating Models . . . . . . . . . . . 15

viii CONTENTS

3 Commonly Used Credit Assessment Models 21

3.1 Heuristic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3 Causal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.4 Hybrid Form Models . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.5 Performance of Credit Risk Models . . . . . . . . . . . . . . . . . 31

4 Data Resources 35

4.1 Data dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2 Quantitative key figures . . . . . . . . . . . . . . . . . . . . . . . 39

4.3 Qualitative figures . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4 Customer factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.5 Other factors and figures . . . . . . . . . . . . . . . . . . . . . . . 54

4.6 Exploratory data analysis . . . . . . . . . . . . . . . . . . . . . . 58

5 The Modeling Toolbox 61

5.1 General Linear Models . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . 69

5.3 Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . 73

5.4 k-Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.5 CART, a tree-based Method . . . . . . . . . . . . . . . . . . . . . 77

5.6 Principal Component Analysis . . . . . . . . . . . . . . . . . . . 80

6 Validation Methods 85

CONTENTS ix

6.1 Discriminatory Power . . . . . . . . . . . . . . . . . . . . . . . . 85

6.2 Relative frequencies and Cumulative frequencies . . . . . . . . . 86

6.3 ROC curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.4 Measures of Discriminatory Power . . . . . . . . . . . . . . . . . 88

6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7 Modeling Results 99

7.1 General Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.2 Principal Component Analysis . . . . . . . . . . . . . . . . . . . 107

7.3 Resampling Iterations . . . . . . . . . . . . . . . . . . . . . . . . 112

7.4 Performance of Individual Variables . . . . . . . . . . . . . . . . 114

7.5 Performance of Multivariate Models . . . . . . . . . . . . . . . . 120

7.6 Addition of Variables . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.7 Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . 124

7.8 Link functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

8 Conclusion 129

8.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . 129

8.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

A Credit Pricing Modeling 133

A.1 Modeling of Loss Distribution . . . . . . . . . . . . . . . . . . . . 133

B Additional Modeling Results 135

x CONTENTS

B.1 Detailed Performance of Multivariate Models . . . . . . . . . . . 135

B.2 Additional Principal Component Analysis . . . . . . . . . . . . . 142

B.3 Unsuccessful Modeling . . . . . . . . . . . . . . . . . . . . . . . . 149

C Programming 153

C.1 The R Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

C.2 R code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Chapter 1

Introduction

1.1 Background

Banking is built on the idea of profiting by loaning money to ones that are inneed of money. Banks then collect interests on the payments which the borrowermakes in order to pay back the money they borrowed. The likely event thatsome borrowers will default on their loans, that is fail to make their payments,results in a financial loss for the bank.

In the application process for new loans, banks assess the potential borrowerscreditworthiness. As a measure of creditworthiness some assessment are madeon the probability of default for the potential borrowers. The risk that thecredit assessment of the borrowers is to modest, is called credit risk. Credit riskmodeling is quite an active research field. Before the milestone of Altman [2],credit risk on corporate loan was based on subjective analysis of credit expertsof financial institutes.

Probability of default is a key figure in the daily operation of any credit institute,as it is used as a measure of credit risk in both internal and external reporting.

The credit risk assessments made by banks are commonly referred to as creditrating models. In this thesis various statistical methods are used as modeling

2 Introduction

procedures for credit rating models.

1.2 Aim of Thesis

This thesis is done in co-operation with a corporate bank, which supplied thenecessary data resources. The aim of the thesis is to see whether logistic re-gression can outperform the current heuristic credit rating model used in theco-operating corporate bank . The current model is called Rating Model Cor-porate (RMC) and is described better in section 4.5.1. This was the only clearaim in the beginning, but further goals were acquired in the proceedings of thethesis.

First some variables that were not used in RMC but were still available, aretested. Then an attempt was made to model credit default with different math-ematical procedures. Also an effort was made to combine some of those methodswith logistic regression. Since discriminant analysis have seen excessive use incredit modeling the performance of discriminant analysis was documented forcomparison.

Validation of credit ratings is hard compared to regular modeling whereas thereis no true or observed rating that can be compared with the predicted creditrating to measure the prediction error. There are some validation methodsavailable but no single measure can be used in order to make the clear cutdecision on whether one model is better than other. It is thus necessary toconsider numerous measures simultaneously to draw some conclusion on modelperformance. This has an clear disadvantage as it might be debateable whetherone model is better than another. In order to address this problem an attemptwas made to combine the measures that are available into a single measure.

As missing values are frequently apparent in many of the modeling variables,some thoughts are made on how that particular problem could be solved. Theproblem regarding small sample of data is dealt with.

The general purpose of this thesis is to inform the reader on how it is possibleto construct credit rating models. Special emphasis is made on the practicalmethods that a bank in the corporate banking sector could make use of, in thedevelopment process of a new credit rating model.

1.3 Outline of Thesis 3

1.3 Outline of Thesis

Credit risk modeling is a wide field. In this thesis an attempt is made to sheda light on the many different subjects of credit risk modeling. Chapters 2 and6 provide the fundamental understanding of credit risk modeling.

The structure of the thesis is as follows.

Chapter 2: Credit Modeling Framework. Introduces the basic conceptsof credit risk modeling. Furthermore, a discussion on the ongoing financialcrisis is given. Then finally a detailed description of the modeling processis given.

Chapter 3: Commonly Used Credit Assessment Models. Gives a briefintroduction on the different types and performance of commonly usedcredit assessment models.

Chapter 4: Data Resources. Gives a quite detailed description about thedata used in the analysis. The data were supplied by a co-operating cor-porate bank.

Chapter 5: The Modeling Toolbox. Gives a full discussion on the mathe-matical procedures that where used in the model development.

Chapter 6: Validation Methods. Introduces the large selection of valida-tion methods. As validation is a fundamental part of credit risk modeling.

Chapter 7: Modeling Results. The main findings are presented. Perfor-mance of different mathematical procedures are listed. Furthermore theperformance of variables is given a discussion.

Chapter 8: Conclusion. Concludes on the thesis and includes a section aboutfurther works.

Appendix A: Credit Pricing Models. Introduces a practical method to es-timate the loss distribution. The estimation of the loss distribution canbe used to extend the credit rating model to a credit pricing model.

Appendix B: Additional Modeling Results. Some modeling results thatwere considered less important results are presented.

Appendix C: Programming. Includes an introduction to R, the program-ming language used.

4 Introduction

Chapter 2

Credit Modeling Framework

In order to get a better feel for credit modeling framework there are someimportant concepts and measures that are worth considering. It is also worthconsidering the need of credit modeling and the important role of internationallegislation on banking supervision, called Basel II.

In Section 2.1 the most important concepts of the credit modeling frameworkare defined. The definitions are partly adapted from the detailed discussionin Ong [26] and Alexander and Sheedy [1]. Section 2.2 discusses the ongoingfinancial crisis that are partly due to poor credit ratings and finally the modeldevelopment process is introduced in Section 2.3.

2.1 Definition of Credit Concepts

The major activity of most banks1 is to raise principal by loaning money tothose who are in need of money. They then collect interests on the paymentsmade by the borrower in order to pay back the principal borrowed. As someborrowers fail to make their payments, they are said to have defaulted on theirpromise of repayment. A more formal definition of default is obtained from the

1By the term bank it is also referred to any financial institute giving credit.

6 Credit Modeling Framework

Basel II legislation [6]. A firm2, is defined as a default firm if either or both ofthe following scenarios have taken place.

I - The credit institution considers that the obligor is unlikely to payits credit obligations to the credit institution in full, without recourseby the credit institution to actions such as realizing security (if held).

II - The obligor is past due more than 90 days on any materialcredit obligation to the banking group. Overdrafts will be consideredas being past due once the customer has breached an advised limit orbeen advised of a limit smaller than current outstandings.

By considering the first of the two rather formal definitions, it states that if thebank believes it will not receive their debt in full, without demanding ownershipof the collateral3 taken. The second scenario is simpler as it states that if theborrower has not paid some promised payment, which was due 90 days ago, theborrower is considered to have defaulted on its payment. The sentence regardingoverdrafts4 can be interpreted as if the borrower were to make a transactionbreaking the advised limit or is struggling to lower its limit and thus makingthe bank fear that they will not receive their payment.

It is important to note the difference between the three different terms, in-solvency, bankruptcy and default. The tree terms, are frequently used in theliterature as the same thing. In order to avoid confusion the three terms aregiven an explanation here. The term insolvency refers to a borrower that unableits debt whereas the borrower that has defaulted on its debt is either unwillingor unable to pay their debt. To complicate matters even further insolvency isoften referred to as the situation when liabilities exceed assets, but firms mightstill be profitable and thus be able to pay all their debts. Bankruptcy is a legalfinding that results in a court supervision over the financial affairs of a borrowerthat is either insolvent or in default. It is important to note that a borrower thathas defaulted can come back from being defaulted by settling the debt. Thatmight be done by adding collateral or by getting alternative fundings. Further-more, as will be seen later, when considering loss given default, the event of adefault does not necessary result in a financial loss for the bank.

When potential borrowers apply for a loan at a bank, the bank will evaluatethe creditworthiness of the potential borrower. This assessment is of whether

2A firm is any business entity such as a corporation, partnership or sole trader.3Collateral is an asset of the borrower that becomes the lenders if the borrower defaults

on the loan.4Overdraft is a type of loan meant to cover firms short term cash need. It generally has

an upper bound and interests are payed on the outstanding balance of the overdraft loan.

2.1 Definition of Credit Concepts 7

the borrower can pay the principal and interest when due. The risk that arisesfrom the uncertainty of the credit assessment, especially that it is to modest, iscalled credit risk. According to the Basel Handbook [26] credit risk is the majorrisk to which banks are exposed, whereas making loans is the primary activityof most banks. A formal definition of credit risk is give by Zenios [35] as

The risk of an unkept payment promise due to default of an obligorcounter-party, issuer or borroweror due to adverse price movementsof an asset caused by an upgrading or downgrading of the credit qual-ity of an obligor that brings into question their ability to make futurepayments.

The creditworthiness may decline over time, due to bad management or someexternal factors, such as rising inflation5, weaker exchange rates6, increasedcompetition or volatility in asset value.

The credit risk can be generalized with the following equation

Credit Risk = max {Actual Loss Expected Loss, 0}

where the actual loss is the observed financial loss. Credit risk is thus the riskthat the actual loss is larger than the expected loss. Expected loss is an estimateand the credit risk can be considered the risk that the actual loss is considerablelarger the the expected loss. The expected loss can be divided into furthercomponents as follows

Expected Loss = Probability of Default Exposure at Default Loss Given Default

An explanation of each of these components is adapted from Ong [26].

Probability of Default (PD) is the expected probability that a borrower willdefault on the debt before its maturity7. PD is generally estimated by reviewingthe historical default record of other loans with similar characteristics. PD isgenerally defined as the default probability of a borrower over a one year period.As PDs are generally small numbers they are generally transformed to a riskgrade or risk rating, to make them more readable.

Exposure at Default (EAD) is the amount that the borrower legally owes thebank. It may not be the entire amount of the funds the bank has granted the

5Inflation is an economical term for the general increase in the price level of goods andservices.

6Exchange rates describes the relation between two currencies, specifying how much onecurrency is worth in terms of the other.

7Maturity referes to the final payment date of a loan, at which point all remaining interestand principal is due to be paid.


borrower. For instance, a borrower with an overdraft, under which outstandingsgo up and down depending on the borrowers cashflow needs, could fail at a pointwhen not all of the funds has been drawn down. EAD is simply the exact amountthe borrower owes at the time of default and can easily be estimated at any timeas the current exposure. The current exposure is the current outstanding debtminus a discounted value of the collateral. The discounted value of the collateralis meant to represent the actual value of the collateral.

Loss Given Default (LGD) is a percentage of the actual loss of EAD, that thebank suffers. Banks like to protect themselves and frequently do so by takingcollateral or by holding credit derivatives8 as a securitization. Borrowers mayeven have a guarantor who will adopt the debt if the borrower defaults, in thatcase the LGD takes the value zero. The mirror image of LGD, recovery rategiven default is frequently used in the literature and they add up to the amountowed by the borrower at the time of default, EAD. Loss given default is simplythe expected percentage of loss on the funds provided to the borrower. Altmanet al. [4] reports empirical evidence that observed default rates and LGDs arepositively correlated. From this observation it is possible to conclude that banksare successful in protecting themselves when default rates are moderate, but failto do so when high default rates are observed.

Expected Loss (EL) can be seen as the average loss of historically observed losses.EL can also be estimated using estimates of the three components in equation(2.1).

EL = PD EAD LGD (2.1)EL estimations is partly decisive of the banks capital requirement. Capitalrequirements, that is the amount of money that the bank has to keep available,is determined by financial authorities and is based on common capital ratios9.The capital requirements are though usually substantially higher than EL asit has to cover all types of risk that the bank is imposed to, such as market,liquidity, systematic and operational risks10 or simply all risks that might resultin a solvency crisis for the bank. Un-expected Loss (UEL) is defined in Alexander

8Credit derivatives are bilateral contracts between a buyer and seller, under which theseller sells protection against the credit risk of an underlying bond, loan or other financialasset.

9Tier I, Tier II, leverage ratio, Common stockholders equity.10

Market risk the risk of unexpected changes in prices or interest or exchange rates.

Liquidity risk the risk that the costs of adjusting financial positions will increase substan-tially or that a firm will lose access to financing.

Systemic risk the risk of breakdown in marketwide liquidity or chain-reaction default.

Operational risk the risk of fraud, systems failures, trading errors, and many other internalorganizational risks.


and Sheedy [1] with respect to a certain Value at Risk (VaR) quantile and theprobability distribution of the portfolios loss. The VaR quantile can be seen asan estimate of the maximum loss. The VaR quantile is defined mathematicallyas Pr [Loss V aR] = , where is generally chosen as high quantiles 99%-99.9%. For a certain VaR quantile the UEL can be defined as

UEL = VaR ELThe name un-expected loss is somewhat confusing as the value rather stateshow much incremental loss could be expected in a worst case scenario. Furtherdiscussion on how to obtain an estimate of EL, VaR and UEL can be seen inAppendix A.

One of the primary objectives of this thesis is to consider how to obtain the bestpossible estimate of probability of default of specific borrowers. It is thereforeworth considering what is the purpose of acquiring the best possible estimateof PDs. The PDs are reported as a measure of risk to both banks executiveboard and to financial supervisory authorities. The duty of financial supervi-sory authority is to monitor the banks financial undertakings and to ensurethat banks have reliable banking procedures. Financial supervisory authoritydetermine banks, capital requirements. As banks like to minimize their capitalrequirements it is of great value to show that credit risk is successfully modeled.

Expected loss, capital requirements along with the PDs are the main factors indeciding the interest rate for each borrower. As most borrowers will look for thebest offer on the market it is vital to have a good rating model. In a competitivemarket, banks will loan at increasingly lower interest rates. Thus some of themmight default and as banks loan other banks, that might cause a chain reaction.

Banking legislation

If a chain of banks or a major bank would default, it would have catastrophicconsequences on any economic system. As banks loan each others the operationsof banks are very integrated with each other. Strong commercial banks are thedriving force in the economical growth of any country, as they make fundsavailable for investors. Realizing this the central bank governors of the G10nations11 founded the Basel Committee on Banking Supervision in 1974. Theaim of this committee is according to their website [8]

The Basel Committee on Banking Supervision provides a forum forregular cooperation on banking supervisory matters. Its objective is

11The twelve member states of G10 are: Belgium, Netherlands, Canada, Sweden, France,Switzerland, Germany, United Kingdom, Italy, United States, Japan and Luxembourg.


to enhance understanding of key supervisory issues and improve thequality of banking supervision worldwide. It seeks to do so by ex-changing information on national supervisory issues, approaches andtechniques, with a view to promoting common understanding. Attimes, the Committee uses this common understanding to developguidelines and supervisory standards in areas where they are consid-ered desirable. In this regard, the Committee is best known for itsinternational standards on capital adequacy; the Core Principles forEffective Banking Supervision; and the Concordat on cross-borderbanking supervision.

The Basel committee published an accord called Basel II in 2004 which is meantto create international standards that banking regulators can use when creatingregulations about how much capital, banks need to keep solvent in order toavoid credit and operational risks.

More specifically the aim of the Basel II regulations is according to Ong [26]to quantify and separate operational risk from credit risk and to ensure thatcapital allocation is more risk sensitive. In other words Basel II sets a guidelinehow, banks in-house estimation of the loss parameters; probability of default(PD), loss given default (LGD), and exposure at default (EAD), should be.As banks need regulators approval, these guidelines ensure that banks holdsufficient capital to cover the risk that the bank exposes itself to through itslending and investment practices. These international standards should protectthe international financial system from problems that might arise should a majorbank or a series of banks collapse.

Credit Modeling

The Basel II accord introduces good practices for internal based rating systemsas another option to using ratings obtained from credit rating agencies. Creditrating agencies rate; firms, countries and financial instruments based on theircredit risk. The largest and amongst the most cited agencies are Moodys,Standard & Poors and Fitch Ratings. Internal based rating systems have theadvantage over the rating agencies that, there are addition information availableinside that bank, such as credit history and credit experts valuation. Internalbased ratings can be obtain for all borrowers whereas for rating agencies ratingsmight be missing some potential borrowers. Furthermore, rating agencies justpublicly report the risk grades of larger firms, whereas there is a price to viewtheir ratings for small and medium sized firms.

There are two different types of credit models that should not be confused


together. One is credit rating models and the other is credit pricing models.There is a fundamental difference in the two models as the credit rating modelsare used to model PDs and the pricing models consider combinations of PDs,EADs and LGDs to model the EL. A graphical representation of the two modelscan be seen in Figure 2.1.

Figure 2.1: Systematic overview of Credit Assessment Models.

In this thesis credit rating models are of the main concern, as it is of morepractical use and can be used to get estimates of EL. By estimating the ELthe same result as for credit pricing models is obtained. Reconsidering therelationship between the risk components in equation (2.1).

The PDs are obtained from the credit rating model, the EAD is easily estimatedas the current exposure. An estimate of LGD can be found by collecting his-torical data of LGD and in Figure 2.2 an example of LGD distribution can beseen. The average which lies around 40% does not represent the distributionwell. A more sophisticated procedure would be to model the event of loss orno loss with some classification procedure, e.g. logistic regression. Then usethe left part of the empirical distribution to model those classified as no lossand the right part for those classified as loss. The averages of each side of thedistribution could be used. It would though be even better to use LGD as astochastic variable, and consider it to be independent of PD. It is generallyseen in practice that LGDs are assumed independent of PDs as Altman et al.[4] points out that the commercial credit pricing models12 use LGD either as

12These value-at-risk (VaR) models include J.P. Morgans CreditMetrics R, McKin-


Histogram of LGD

LGD [%]

Rel

ativ

e F

requ

ency

0 20 40 60 80 100

0.00

0.01

0.02

0.03

0.04

0.05

Figure 2.2: Example of a empirical distribution of Loss Given Default (LGD).

a constant or a stochastic variable independent from PD. When estimations ofPDs, EADs and LGDs have been obtained they can be used to estimate the EL.A practical procedure to estimate the expected loss is given an introduction inappendix A.

seys CreditPortfolioView R, Credit Suisse Financial Products CreditRisk+ R, KMVsPortfolioManager R, and Kamakuras Risk Manager R.

2.2 Subprime Mortgage Crisis 13

2.2 Subprime Mortgage Crisis

It is important to recognize the importance of macro-economics13 on observeddefault frequencies. By comparing the average default rates reported by Altmanet al. [4] and reports of recent recessions14 a clear and simple relationship can beseen. Wikipedia [33] reports a recession in the early 1990s and in the early 2000and Altman et al. [4] reports default rates higher than 10% in 1990, 1991, 2001and 2002, whereas frequently observed default rates are between 1% and 2%.The relationship is that high default rates are observed at and after recessiontimes.

In their 2006 paper, Altman et al. [4], argue that there was a type of creditbubble on the rising, causing seemingly highly distressed firms to remain non-bankrupt when, in more normal periods, many of these firms would have de-faulted. Their words could be understood as there has been given to much creditto distressed firms, which would thus result in greater losses when that creditbubble would collapse. With the financial crisis of 2008 that credit bubble iscertain to have bursted. This might result in high default rates and significantlosses for corporate banks in the next year or two, only time will tell.

The financial crisis of 2008 is directly related to the subprime mortgage cri-sis, whereas high oil and commodity prices have increased inflation, which hasinduced further crisis situations. A brief discussion, adapted from Maslakovic[22], on the subprime mortgage crisis and its causes follows.

The subprime mortgage crisis is an ongoing worldwide economic problem, re-sulting in liquidity issues in the global banking system. The crisis began withthe bursting of the U.S. housing bubble in late 2006, resulting high defaultrates on subprime and other adjustable rate mortgages (ARM). The term, sub-prime refers to higher-risk borrowers, that is borrowers with lower income orlesser credit history than prime borrowers. Subprime lending has been a ma-jor contributor to the increases in home ownership in the U.S. in recent years.The easily obtained mortgages, combined with the assumption of rising hous-ing prices after a long term trend of rising housing prices encouraged subprimeborrowers to take mortgage loans. As interest rates went up, and once housingprices started to drop moderately in 2006 and 2007 in many parts of the U.S.,defaults and foreclosure activity increased dramatically.

13Macroeconomics is the field of economics that considers the performance and behaviorof a national or regional economy as a whole. Macroeconomists try to model the structureof national income/output, consumption, inflation, interest rates and unemployment rates,amongst others. Macro- refers to large scale whereas micro- refers to small scale.

14A recession is a contraction phase of the business cycle. Recession is generally defined aswhen there has been a negative growth in real gross domestic product (GDP) for two or moreconsecutive quarters. A sustained recession is referred to as depression.


The mortgage lenders were the first to be affected, as borrowers defaulted, butmajor banks and other financial institutions around the world were hurt aswell. The reason for their pain was due to a financial engineering tool calledsecuritization, where rights to the mortgage payments is passed on via mortgage-backed securities (MBS) and collateralized debt obligations (CDO). Corporate,individual and institutional investors holding MBS or CDO faced significantlosses, as the value of the underlying mortgage assets declined. The stock pricesof those firms reporting great losses caused by their involvement in MBS orCDO fell drastically.

The widespread dispersion of credit risk through CDOs and MBSs and theunclear effect on financial institutions caused lenders to reduce lending activityor to make loans at higher interest rates. Similarly, the ability of corporationsto obtain funds through the issuance of commercial paper was affected. Thisaspect of the crisis is consistent with a credit crisis term called credit crunch. Thegeneral crisis caused stock markets to decline significantly in many countries.The liquidity concerns drove central banks around the world to take action toprovide funds to member banks to encourage the lending of funds to worthyborrowers and to re-invigorate the commercial paper markets.

The credit crunch has cooled the world economic system, as fewer and moreexpensive loans decrease the investments of businesses and consumers. Themajor contributors to the subprime mortgage crisis were poor lending practicesand mispricing of credit risk. Credit rating agencies have been criticized forgiving CDOs and MBSs based on subprime mortgage loans much higher ratingsthen they should have, thus encouraging investors to buy into these securities.Critics claim that conflicts of interest were involved, as rating agencies are paidby the firms that organize and sell the debt to investors, such as investmentbanks. The market for mortgages had previously been dominated by governmentsponsored agencies with stricter rating criteria.

In the financial crisis, which has been especially hard for financial institutesaround the world, the words of the prominent Cambridge economist John May-nard Keynes have never been more appropriate, as he observed in 1931 duringthe Great Depression:

A sound banker, alas, is not one who foresees danger and avoidsit, but one who, when he is ruined, is ruined in a conventional wayalong with his fellows, so that no one can really blame him.

2.3 Development Process of Credit Rating Models 15

2.3 Development Process of Credit Rating Mod-

els

In this section the development process of credit rating models is introduced.Figure 2.3 shows the systematic overview of the credit modeling process. Therectangular boxes in Figure 2.3 represent processes, whereas the boxes withthe sloped sides represent numerical informations. As can be seen from Figure2.3 there are quite a few processes inside the credit rating modeling process.The figure shows the journey from the original data to the model performanceinformations.

Figure 2.3: Systematic overview of the Credit Rating Modeling Process.

The data used are recordings from the co-operating banks database, and theyare the same data as used in Rating Model Corporate (RMC). The data aregiven a full discussion in Chapter 4 can be categorized as shown at the top ofFigure 2.3.

The data goes through a certain cleaning process. A firm that is not observedin two successive years, it is either a new customer or a retiring one, and thusremoved from the dataset. Observations with missing values are also removedfrom the dataset.


When the data has been cleansed they will be referred to as complete andthey are then splitted into training and validation sets. The total data will beapproximately splitted as following, 50% will be used as a training set, 25% asa validation set and 25% as a test set:

Training Validation Test

The training set is used to fit the model and the validation set is used to estimatethe prediction error for model selection. In order to account for the small sampleof data, that is of bad cases, the process of splitting, fitting, transformation andvalidation is performed recursively.

The test set is then used to assess the generalization error of the final modelchosen. The training and validation sets, together called modeling sets, arerandomly chosen sets from the 2005, 2006 and 2007 dataset, whereas the testset is the 2008 dataset. The recursive splitting of the modeling sets is done bychoosing a random sample without replacement such that the training set is 2/3and validation set is 1/3 of the modeling set.

In the early stages of the modeling process it was observed that different seedingsinto training and validation sets, resulted in considerable different results. Inorder to accommodate this problem a resampling process is performed and theaverage performance over N samples is considered for variable selection. Inorder to ensure that the the same N samples are used in the resampling processthe following procedure is performed:

- First a random number, called the seed, is selected e.g. 2345.

- From the seed a set of random numbers, called a seeding pool, are gener-ated. The modeling sample is then splitted into the training and validationsets using a identity from the seeding pool.

- After the splitting into the training and validation sets, the default ratesof the two sets are calculated, respectively. If the difference in defaultrates is more than 10% then that particular split is rejected and with anew identity from the seeding pool a new split is tried recursively until anappropriate training and validation sets are obtained.

An example of the different performances for different splits for RMC and alogistic regression model can be seen in Figure 2.4. The figure shows the clear


need for the resampling process. This can be seen by considering the differ-ent splits in iteration 1 and 50 respecitvely. For iteration 1 the RMC wouldhave been preferred to the LR model. The opposit conclusion would have beenreached if the split of iteration 50 would have been considered.

11

11

11

1

111

11111

111111

11111

111111

1111

111111

11111

11

1

0 10 20 30 40 50

4

2

02

Performance Comparison

Iteration

PC

A.s

tat

2

2

2

2

2

2

2

2

2

2

22

2

22

2

2

2

22

22

22

2

2

2

2

2

2

2

2

2

2

2

22

2

22

2

22

22

2

22

2

2

LR ModelRMC

Figure 2.4: Comparison of the performance of a Logistic regression model andRMC. The performances have been ordered in such a way that the performanceof the LR model is in an increasing order.

The datasets consists of creditworthiness data and the variable of whether thefirm has defaulted a year later. The default variable is given the value one ifthe firm has defaulted and the value zero otherwise.

When the training and validation sets have been properly constructed, the mod-eling is performed. The modeling refers to the process of constructing a modelthat can predict whether a borrower will default on their loan, using some previ-ous informations on similar firm. The proposed model is fitted using the data ofthe training set and then a prediction is made for the validation set. If logistic


regression15 is used as a modeling method then the predicted values will lie onthe interval [0,1] and the predicted values can be interpreted as the probabilitiesof default (PD). Generally when one is modeling some event or non-event thepredicted values are rounded to one for event and to zero for non-event. Thereis a problem to this as the fitted values depend largely on the ratios of zerosand ones in the training sample. That is, for cases when there are alot of zeroscompared to ones in the training set, which is the case for credit default data,the predicted values will be small. Those probabilities can be interpreted as theprobability of default of individual firm. An example of computed probabilitiescan be seen in Figure 2.5.

Histogram of Probability of Default

Prob. Default

Fre

quen

cy

0.00 0.05 0.10 0.15 0.20 0.25 0.30

020

040

060

0

Figure 2.5: Example of a emperical distribution of probabilities of default (PD).

From Figure 2.5 it is apparent that the largest PD is considerable below 0.5 andthus all the fitted values would get the value zero if they where rounded to binary

15Logistic regression is a modeling procedure that is specialized for modeling when thedependent variable is either one or zero. Logistic regression is introduced in section 3.2.2 anda more detailed discussion can be seen in section 5.2.2.


numbers. This is the main reason for why ordinary classification and validationmethods do not work on credit default data. The observed probabilities ofdefault are small numbers and thus not easily interpreted. Hence, to enhancethe readability the default probabilities they are transformed to risk ratings.Rating Model Corporate has 12 possible ratings and the same transformation torisk rating scale was used for proposed models, in order to ensure comparability.The transformation from PDs to risk ratings is summarized in Table 2.1.

PD-interval Rating

[ 0.0%; 0.11% [ 12[ 0.11%; 0.17% [ 11[ 0.17%; 0.26% [ 10[ 0.26%; 0.41% [ 9[ 0.41%; 0.64% [ 8[ 0.64%; 0.99% [ 7[ 0.99%; 1.54% [ 6[ 1.54%; 2.40% [ 5[ 2.40%; 3.73% [ 4[ 3.73%; 5.80% [ 3[ 5.80%; 9.01% [ 2

[ 9.01%; 100.0% ] 1

Table 2.1: Probabilities of Default (PD) are transformed to the relative riskrating.

It is apparent from Table 2.1 that the PD-intervals are very different is size.It is also apparent that low PDs representing a good borrower are transformedto high risk rating. An example of a risk rating distribution can be seen inFigure 2.6. When the ratings have been observed it is possible to validate theresults, that is done by computing the discriminatory power16 of the observedratings. The discriminatory power indicators are then compared to the indica-tors calculated for RMC in the specific validation set. The model performanceis concluded from the discriminatory power indicators. Numerous discrimina-tory power methods are presented in Section 6.4. Important information canbe drawn form visual representation of the model performance as in the rela-tive and cumulative frequencies of the good and bad cases respectively and therespective ROC curve, which are all introduced in Sections 6.2 and 6.3. Visualcomparison is not made when the modeling is performed on numerous modelingsets, that is when the resampling process is used.

16The term, discriminatory power refers to the fundamental ability to differentiate betweengood and bad cases and is introduced in Section 6.1.


Histogram of Predicted Ratings

Rating Class

Rel

ativ

e F

requ

ency

2 4 6 8 10 12

0.00

0.05

0.10

0.15

Figure 2.6: Example of a Risk Rating distribution, when the PDs have beentransformed to risk ratings.

From the model performance it is possible to assess different varaibles and mod-eling procedures. The results can be seen in Section 7.

Chapter 3

Commonly Used Credit

Assessment Models

In this chapter, credit assessment models, commonly used in practice, are pre-sented. First their general functionality and application is introduced, followedby a light discussion of current research in the field is given. The credit as-sessment models are used to rate borrowers based on their creditworthiness andthey can be grouped as seen in Figure 3.1. The three main groups are heuristic,statistical and causal models. In practice, combinations of heuristic and eitherof the other two methods are frequently used and referred to as hybrid mod-els. The discussion here is adapted from Datschetzky et al. [13]1 and should beviewed for a more detailed discussion.

Heuristic models are discussed in Section 3.1 and a brief introduction of sta-tistical models in Section 3.2 and a more detailed discussion in Chapter 5. InSection 3.3 models based on option pricing theory and cash flow simulation areintroduced and then finally hybrid form models are introduced in Section 3.4.

1Chapter 3

22 Commonly Used Credit Assessment Models

Figure 3.1: Systematic overview of Credit Assessment Models.

3.1 Heuristic Models

Heuristic models attempt to use past experience to evaluate the future creditwor-thiness of a potential borrower. Credit experts choose relevant creditworthinessfactors and their weights, based on their experience. Significancy of factors arenot necessarily estimated and their weights not necessarily optimized.

3.1.1 Classic Rating Questionnaires

In classic rating questionnaires the credit institutions, credit experts defineclearly answerable questions regarding factors relevant to creditworthiness andassigns fixed number of points to specific answers. Generally, the higher thepoint score the better the credit rating will be. This type of models are fre-quently observed in the public sector, and then filled out by a representative ofthe credit institute. An example of questions for a public sector customer mightbe, sex, age, maritual status and income.

3.1 Heuristic Models 23

3.1.2 Qualitative Systems

In qualitative systems the information categories relevant to creditworthinessare defined by credit experts, but in contrast to questionnaires , qualitativesystems are not assigned a fixed value in each factor. Instead, a representativeof the credit institute evaluates the applicant for each factor. This might bydone with grades and then the final assessment would be a weighted or simpleaverage of all grades. The grading system need to be well documented in orderto get similar ratings from different credit institute representatives.

In practice, credit institutions have used these procedures frequently, especiallyin the corporate customer segment. Improvements in data availability alongwith advances in statistics have reduced the use of qualitative systems.

3.1.3 Expert Systems

Expert systems are software solutions which aim to recreate human problemsolving abilities. The system uses data and rules selected by credit experts inorder to evaluate its expert evaluation.

Altman and Saunders [3] reports that bankers tend to be overly pessimisticabout the credit risk and that multivariate credit-scoring systems tend to out-perform such expert systems.

3.1.4 Fuzzy Logic Systems

Fuzzy logic systems can be seen as a special case of expert systems with theadditional ability of fuzzy logic. In a fuzzy logic system, specific values enteredfor creditworthiness criteria are not allocated to a single categorical term e.g.high or low, rather they are assigned multiple values. As an example considera expert system that rates firms with return on equity of 15% or more as goodand a return on equity of less than 15% as poor. It is not in line with humandecision-making behavior to have such sharp decision boundaries, as it is notsensible to rate a firm with return on equity of 14.9% as poor and a firm with areturn on equity of 15% as good. By introducing a linguistic variable as seen inFigure 3.2 a firm having return on equity of 5% would be considered 100% poorand a firm having return on equity of 25% would be considered 100% good. Afirm with a return on equity of 15% would be be considered 50% poor and 50%good. These linguistic variables are used in a computer based evaluation based


0 5 10 15 20 25 30

0

0.2

0.4

0.6

0.8

1

Return on equity (%)

Poor Good

Figure 3.2: Example of a Linguistic Variable.

on the experience of credit experts. The Deusche Bundesbank uses discriminantanalysis as a main modeling procedure with the error rate 18.7%, then afterintroducing fuzzy logic system the error rate dropped to 16%.

3.2 Statistical Models

Statistical models rely on empirical data suggested by credit experts as predic-tors of creditworthiness, while heuristic models rely purely on subjective expe-rience of credit experts. In order to get good predictions from statistical modelslarge empirical datasets are required. The traditional methods of discriminantanalysis and logistic regression are discussed in Sections 3.2.1 and 3.2.2, respec-tively. Then more advanced methods for modeling credit risk are discussed inSection 3.2.3.

3.2 Statistical Models 25

3.2.1 Discriminant Analysis

In 1968, Altman [2] introduced his Z-score formula for predicting bankruptcy,this was the first attempt to predict bankruptcy by using financial ratios. Toform the Z-score formula, Altman used linear multivariate discriminant analysis,with the original data sample consisted of 66 firms. Half of the firms had filedfor bankruptcy.

Altman proposed the following Z-score formula

Z = 0.12X1 + 0.14X2 + 0.033X3 + 0.006X4 + 0.999X5 (3.1)

where

X1 = Working Capital / Total Assets.Measures net liquid assets in relation to the size of the company.

X2 = Retained Earnings / Total Assets.Measures profitability that reflects the companys age

X3 = Earnings Before Interest and Taxes / Total Assets.Measures operating efficiency apart from tax and leveraging factors.

X4 = Market Value Equity / Book Value of Total Debt.Measures how much firms market value can decline before coming insol-vent.

X5 = Sales / Total Assets.Standard measure for turnover and varies greatly from industry to indus-try.

All the values except the Market Value Equity, in X4, can be found directlyfrom firms financial statements. The weights of the original Z-score was basedon data from publicly held manufacturers with assets greater than $1 million,but has since been modified for private manufacturing, non-manufacturing andservice companies. The discrimination of Z-score model can be summarized asfollows

2.99< Z-score Firms having low probability of default1.81 Z-score 2.99 Firms having intermediate probability of default1.81> Z-score Firms having high probability of default

Advances in computing capacity has made discriminant analysis (DA) a populartool for credit assessment. The general objective of multivariate discriminant


analysis is to distinguish between default and non-default borrowers, with helpof several independent creditworthiness figures. Linear discriminant functionsare frequently used in practice and can be given a simple explanation as anweighted linear combination of indicators. The discriminant score is

D = w0 + w1X1 + w1X2 + . . . + wkXk (3.2)

The main advantage of DA, compared to other classification procedures is thatthe individual weights show the contribution of each explanatory variable. Theresult of the linear function is then also easy to interpret, as low Z-score isobserved it represents a poor loan applicant.

The downside to DA is that it requires the explanatory variables to be normallydistributed. Another prerequisite is that the explanatory variables are requiredto have the same variance for the groups to be discriminated. In practice thisis however often thought to be less significant and thus often disregarded.

Discriminant analysis is given a more detailed mathematical discussion in Sec-tion 5.3.

3.2.2 Logistic Regression

Another popular tool for credit assessment is the logistic regression. Logistic re-gression uses as a dependent variable a binary variable that takes the value oneif a borrower defaulted in the observation period and zero otherwise. The inde-pendent variables are all potentially relevant parameters to credit risk. Logisticregression is discussed further and in more detail in Section 5.2.2. A logisticregression is often represented using the logit link function as

p(X) =1

1 + exp[(0 + 1X1 + 1X2 + + kXk)](3.3)

where p(X) is the probability of default given the k input variables X. Logisticregression has several advantages over DA. It does not require normal distribu-tion in input variables and thus qualitative creditworthiness characteristics canbe taken into account. Secondly the results of logistic regression can be inter-preted directly as the probability of default. According to Datschetzky et al.[13] logistic regression has seen more widespread use both in academic researchand in practice in recent years. This can be attributed to the flexibility in datahandling and more readable results compared to discriminant analysis.

3.2 Statistical Models 27

3.2.3 Other Statistical and Machine Learning Methods

In this section a short introduction of other methods which can be grouped underthe same heading of statistical and machine learning methods. As advances incomputer programming evolved new methods were tried as credit assessmentmethods, those include

- Recursive Partitioning Algorithm (RPA)

- k-Nearest Neighbor Algorithm (kNN)

- Support Vector Machine (SVM)

- Neural Networks (NN)

A brief introduction of those methods follows.

Recursive Partitioning Algorithm (RPA)

One of these methods Recursive Partitioning Algorithm (RPA) is a data miningmethod that employs decision trees and can be used for a variety of businessand scientific applications. In a study by Frydman et al. [16] RPA was foundto outperform discriminant analysis in most original sample and holdout com-parisons. Interestingly it was also observed that additional information wherederived by using both RPA and discriminant analysis results.

This method is also known as classification and regression trees (CART) and isgiven a more detailed introduction under that name in Section 5.5.

k-Nearest Neighbor Algorithm (kNN)

k-Nearest Neighbor Algorithm is a non-parametric method that considers theaverage of the dependent variable of the k observation that are most similar toa new observation and is introduced in Section 5.4.

Support Vector Machine (SVM)

Support Vector Machine is method closely related to discriminant analysis wherean optimal nonlinear boundary is constructed. This rather complex method is


given a brief introduction in Section 5.3.3.

Neural Networks (NN)

Neural networks use information technology in an attempt to simulate the com-plicated way in which the human brain processes information. Without goinginto to much detail on how the human brain works neural networks can bethought of as multi-stage information processing. In each stage hidden corre-lations among the explanatory variables are identified making the processing ablack box model2. Neural networks can process any form of information whichmakes then especially well suited to form a good rating models. Combiningthe black box modeling and a large set of information NN generally show highlevels of discriminatory power. However, the black box nature of NN results ingreat acceptance problems. Altman et al. [5] concluded that the neural networkapproach did not materially improve upon the linear discriminant structure.

3.2.4 Hazard Regression

Hazard regression3 considerers time until failure, default in the case of creditmodeling. Lando [21] refers to hazard regression as the most natural statisti-cal framework to analyze survival data but as Altman and Saunders [3] pointsout an financial institute would need a portfolio of some 20,000-30,000 firmsto develop very stable estimates of default probabilities. Very few financialinstitutes worldwide come even remotely close to having this number of poten-tial borrowers. The Robert Morris Associates, Philadelphia, PA, USA, havethough initiated a project to develop a shared national data base, among largerbanks, of historic mortality loss rates on loans. Rating agencies, have adoptedand modified the mortality approach and utilize it in their structured financialinstrument analysis, according to Altman and Saunders [3].

3.3 Causal Models

Causal models in credit assessment procedures use the analytics of financialtheory to estimate creditworthiness. These kind of models differ from statisticalmodels in the way that they do not rely on empirical data sets.

2A black box model is a model where the internal structure of the model is not viewable3Hazard Regression is also called Survival Analysis in the literature.

3.3 Causal Models 29

3.3.1 Option Pricing Models

The revolutionary work of Black and Scholes (1973) and Merton (1974) formedthe basis of the option pricing theory. The theory was originally used to priceoptions4 can also be used to valuate default risk on the basis of individualtransactions. Option pricing models can be constructed without using a com-prehensive default history, however it requires data on the economic value ofassets, debt and equity and especially volatilities. The main idea behind theoption pricing model is that credit default occurs when the economic value ofthe borrowers asset falls below the economic value of the debt.

The data required makes it impossible to use option pricing models in the publicsector and it is not without its problem to require the data needed for thecorporate sector, it is for example difficult in many cases to assess the economicvalue of assets.

3.3.2 Cash Flow Models

Cash flow models are simulation models of future cash flow arising from theassets being financed and are thus especially well suited for credit assessmentin specialized lending transactions. Thus the transaction itself is rated, not thepotential borrower and the result would thus be referred to as transaction rating.Cash flow models can be viewed as a variation of the option pricing model wherethe economic value of the firm is calculated on the basis of expected future cashflow.

3.3.3 Fixed Income Portfolio Analysis

Since the pioneering work of Markowich, 1959, portfolio theory has been ap-plied on common stock data. The theory could just as well be applied to thefixed income area involving corporate and government bonds and even banksportfolio of loans. Even though portfolio theory could be a useful tool for fi-nancial institutes, widespread use of the theory has not been seen according toAltman and Saunders [3]. Portfolio theory lays out how rational investors willuse diversification to optimize their portfolio. The traditional objective of theportfolio theory is to maximize return for a given level of risk and can also beused for guidance on how to price risky assets. Portfolio theory could be applied

4financial instrument that gives the right, but not the obligation, to engage in a futuretransaction on some underlying security.


to banks portfolio to price, by determining interest rates, new loan applicantsafter calculating their probability of default (PD), their risk measure.

3.4 Hybrid Form Models

The models discussed in previous sections are rarely used in their pure form.Heuristic models are often used in collaboration with statistical or causal mod-els. Even though statistical and causal models are generally seen as better ratingprocedures the inclusion of credit experts knowledge generally improves ratings.In addition not all statistical models are capable of processing qualitative infor-mation e.g. discriminant analysis or they require a large data set to producesignificant results.

The use of credit experts knowledge also improves users acceptance.

There are four main architectures to combine the qualitative data with thequantitative data.

- Horizontal linking of model types. Then both qualitative and quanti-tative data are used as a input in the rating machine.

- Overrides, here the rating obtained from either statistical or a causalmodel is altered by the credit expert. This should only be done for fewfirms and only if it is considered necessary. Excessive use of overrides mayindicate a lack of user acceptance or a lack of understanding of the ratingmodel.

- Knock Out Criteria, here the credit experts set some predefined rules,which have to be fulfilled before an credit assessment is made. This canfor example that some specific risky sectors are not considered as possiblecustomers.

- Special Rules, here the credit experts set some predefined rules. Therules can be on almost every form and regard every aspect of the modelingprocedure. An example of such rules would be that start-up firms couldnot get higher ratings than some predefined rating.

All or some of these architectures could be observed in hybrid models.

3.5 Performance of Credit Risk Models 31

3.5 Performance of Credit Risk Models

In order to summarize the general performance of the models in this Chapterthe performance of some of the models can be seen in Table 3.1 Datschetzkyet al. [13]5 reports a list of Gini Coefficient6 values obtained in practice fordifferent types of rating models. As can be seen in Table 3.1 multivariate model

Model Gini Coefficient

Univariate models In general, good individual indicatorscan reach 30-40%. Special indica-tor may reach approx 55% in selectedsamples.

Classic rating questionnaire Frequently below 50%/ qualitative systems

Option pricing models Greater than 55% for exchange-listedcompanies.

Multivariate models (discriminantanalysis and logistic regression)

Practical models with quantitative in-dicators reach approximately 60-70%.

Multivariate models with quantita-tive and qualitative factors

Practical models reach approximately70-80%

Neural Networks Up to 80% in heavily cleansed sam-ples: however, in practice this valueis hardly attainable.

Table 3.1: Typical values obtained in practice for the Gini coefficient as a mea-sure of discriminatory power.

generally outperform option pricing models by quite a margin. The importanceof qualitative factors as modeling variables is also clear. Neural networks havealso been shown to produce great performance, but the high complexity of therating procedure makes neural networks a less attractive option.

In the study of Yu et al. [34] highly evolved neural networks where comparedwith logistic regression, simple artificial neural network (ANN) and a supportvector machine (SVM). The study also compared a fuzzy support vector ma-chine (Fuzzy SVM). The study was performed on detailed information of 60

5pp. 1096The Gini coefficient ranges form zero to one, one being optimal.The Gini coefficient is

introduced in Section 6.4


corporations which of 30 where insolvent. The results reported in Table 3.27

Category Model Rule Average Hit Rate (%)

Single Log R 70.77 [5.96]ANN 73.63 [7.29]SVM 77.84 [5.82]

Hybrid Fuzzy SVM 79.00 [5.65]Ensemble Voting-based Majority 81.63 [7.33]

Reliability-based Maximum 84.14 [5.69]Minimum 85.01 [5.73]Median 84.25 [5.86]Mean 85.09 [5.68]Product 85.87 [6.59]

Table 3.2: Results of a comprehensive study of Yu et al. [34], emphasizing onneural networks. The figures in the brackets are the standard deviations.

show that logistic regression has the worst performance of all the single model-ing procedures, whereas SVM performs best of the single modeling proceduresBy introducing fuzzy logic to the SVM the performance improves. The multi-stage reliability-based neural network ensemble learning models all show similarperformance and outperform the single and hybrid form models significantly.

Galindo and Tamayo [17] conducted an extensive comparative research of dif-ferent statistical and machine learning modeling methods of classification on amortgage loan data set. Their findings for a training sample of 2,000 recordsare summarized in Table 3.3. The results show that CART decision-tree models

Model Average Hit Rate (%)

CART 91.69Neural Networks 89.00K-Nearest Neighbor 85.05Probit 84.87

Table 3.3: Performance of different statistical and machine learning modelingmethods of classification on a mortgage loan data set

7

Total Hit Rate =number of correct classification

the number of evaluation sample

3.5 Performance of Credit Risk Models 33

provide the best estimation for default with an average 91.69% hit rate. NeuralNetworks provided the second best results with an average hit rate of 89.00%.The K-Nearest Neighbor algorithm had an average hit rate of 85.05%. Theseresults outperformed a logistic regression model using the Probit link function,which attained an average hit rate of 84.87%. Although the results are for mort-gage loan data it is clear that the performance of logistic regression models canbe outperformed.

Current studies

Credit crisis in the 70s and 80s fueled researches in the field, resulting in greatimprovements in observed default rates. High default rates in the early 90s andin the beginning of a new millennium have ensured that credit risk modeling isstill an active research field. In the light of the financial crisis of 2008, researchesin the field are sure to continue. Most of the current research is highly evolvedand well beyond the scope of this thesis and is thus just given a brief discussion.

Even though it is not very practical for most financial institutes much of currentresearches are focused on option pricing models. Lando [21] introduces IntensityModeling as the most exciting research area in the field. Intensity models canexplained in a naive way as a mixture of hazard regression and standard pricingmachinery. The objective of Intensity models is not to get the probability ofdefault but to build better models for credit spreads and default intensities. Themath of Intensity models is highly evolved and one should refer to Lando [21]for a complete discussion on the topic.

The subject of credit pricing has also been subject to extensive researches, es-pecially as credit derivatives have seen more common use. The use of macroe-conomical variables is seen as a material for prospective studies.

The discussion here on credit assessment models is rather limited and for furtherinterest one could view Altman and Saunders [3] and Altman et al. [4] for adiscussion on the development in credit modeling, Datschetzky et al. [13] for agood overview of models used in practice. Lando [21] then gives a good overviewof current research in the field, along with extensive list of references.

Chapter 4

Data Resources

The times we live in are sometimes referred to as the information age, whereasthe technical breakthrough of commercial computers have made informationrecordings an easier task. Along with increased information it has also madecomputations more efficient furthering advances in practical mathematical mod-eling.

In the development of a statistical credit rating models the quality of the dataused in the model development, is of great importance. Especially important isthe information on the few firms that have defaulted on their liabilities.

In this chapter the data made available by the co-operating Corporate bankare presented. This chapter is partly influenced by the co-operating banks in-house paper Credit [11]. Section 4.1 introduces data dimensionality and dataprocessing is discussed. Introduction of quantitative and qualitative figures aregiven in Sections 4.2 and 4.3, respectively. Customer factors are introduced inSection 4.4 and other factors and figures are introduced in Section 4.5. Finally,some preliminary data analysis are performed in Section 4.6.

36 Data Resources

4.1 Data dimensions

The data used in the modeling process are the data used in the co-operatingCorporate banks current credit rating model, which is called Rating ModelCorperate (RMC), which is introduced in Section 4.5.1. The available data canbe grouped according to their identity into the following groups

- Quantitative

- Qualitative

- Customer factors

- Other factors and figures

Rating Model Corperate is a heuristic model and was developed in 2004. There-fore, the first raw data are from 2004 as can be seen in Table 4.1. In order tovalidate the performance of the credit rating model the dependent variable,which is whether the firm has defaulted on its obligations a year after it wasrated, is needed. In order to construct datasets that are submissible for valida-tion, firms that are not observed in two successive years and thus being either anew customer or a retireing one, are removed from the dataset. The first valida-tion was done in 2005 and from Table 4.1 it can be seen that the observations ofthe constructed 2005 dataset are noticeably fewer than the raw dataset of 2004and 2005, due to the exclusion of new or retireing customers. The constructeddatasets are the datasets that the co-operating bank would perform their vali-dation on, they are however not submissible for use in modeling purposes. Thereason for that is that there are missing values in the constructed dataset.

By removing missing values from the constructed datset a complete dataset isobtained. It is complete in the sense that there are equally many observationsfor all variables. The problem with removing missing values is that a largeproportion of the data are thrown away as can be seen in Table 4.1. Somevariables have more missing values than others and by excluding some of thevariables with many missing values would result in a larger modeling dataset.

When the data has been cleansed they are splitted into training and validationsets. The total data will be approximately splitted as follows, 50% will be usedas a training set, 25% as a validation set and 25% as a test set:

Training Validation Test

4.1 Data dimensions 37

Data Set Rows Columns

Raw Data- 2008 4063 2- 2007 4125 29- 2006 4237 29- 2005 4262 29- 2004 4521 29

Constructed Data- 2008 3600 29- 2007 3599 29- 2006 3586 29- 2005 3788 29

Complete Data- 2008 2365 29- 2007 2751 29- 2006 2728 29- 2005 2717 29

Table 4.1: Summary of data dimensions and usable observations.

The training set is used to fit the model and the validation set is used to estimatethe prediction error for model selection. In order to account for the small sampleof data, that is of bad cases, the process of splitting, fitting and validation isperformed recursively. The average performance of the recursive evaluations isthen consider in the modeling development.

The test set is then used to assess the generalization error of the final modelchosen. The training and validation sets, together called modeling sets, arerandomly chosen sets from the 2005, 2006 and 2007 dataset whereas the testset is the 2008 dataset. The recursive splitting of the modeling sets is done bychoosing a random sample without replacement such that the training set is 2/3and validation set is 1/3 of the modeling set.

To see how the co-operating banks portfolio is concentrated between sectorsthe portfolio is splitted up into five main sectors, those are:

- Real estate

- Trade

- Production

38 Data Resources

- Service

- Transport

The portfolio is splitted according to a in-house procedure largely based a Danishlegislation called the Danish Industrial Classification 2003 (DB03) which is basedon EU legislations. To view how the portfolio is divided between sectors thenumber of total observations of the complete data set and respective percentageof each sector can be seen in Table 4.2. Table 4.2 also shows the number ofdefaulted observations in each sector and the relative default rate.

Sector Observations [%] Default Observations [%] Default Rate (%)

Real Estate 2295 [28.0] 21 [15.2] 0.92Trade 1153 [14.1] 11 [ 8.0] 0.95Production 3181 [38.8] 82 [59.4] 2.58Service 1348 [16.5] 21 [15.2] 1.56Transport 219 [ 2.7] 3 [ 2.2] 1.37

All 8196 [100.0] 138 [100.0] 1.68

Table 4.2: Summary of the portfolios concentration between sectors and sector-wise default rates.

By analyzing Table 4.2 it is apparent that the production sector is the largestand has the highest default rate. On the other hand the trade and real estatesectors have rather low default rates.

It is difficult to generalize what default rate can be considered as normal, butsome assumptions can be made by considering the average default rates of theperiod 1982-2006 in the U.S. reported by Altman et al. [4]. Where most of theobservations are between one and two percentages that might be considered asnormal default rates. There are not as many observations between two and fivepercentage, which can then be considered as high default rates and percentagesabove five as very high.

4.2 Quantitative key figures 39

4.2 Quantitative key figures

As a quantitative measure of creditworthiness financial ratios are used. A finan-cial ratio is a ratio of selected values on a firms financial statements1. Financialratios can be used to quantify many different aspects of a firms financial per-formance and allow for comparison between firms in the same business sector.Furthermore, financial ratios can be used to, compare firms to its sector aver-age and to consider their variation over time. Financial ratios can vary greatlybetween sectors and can be categorized by which aspect of business it describes,the categories are as follows.

- Liquidity ratios measure the firms, availability of cash to pay debt.

- Leverage ratios measure the firms ability to repay long-term debt.

- Profitability ratios measure the firms use of its assets and control ofits expenses to generate an acceptable rate of return.

- Activity ratios measure how quickly a firm converts non-cash assets tocash assets.

- Market ratios measure investor response to owning a companys stockand also the cost of issuing stock.

Only first four categories of these ratios are used to measure firms creditwor-thiness as the market ratios are mostly used in the financial markets. The dis-cussion here and in the following sections on financial ratios is largely adaptedfrom Credit [11] and Bodie et al. [9]

As the values used to calculated the financial ratio are obtained from firmsfinancial statements, it is only possible to calculate financial ratios when a firmhas published its financial statements. This produces two kinds of problems,firstly new firms do not have financial statements and secondly new data areonly available once a year.

Mathematically financial ratios will be referred to as the greek letter alpha, .Financial ratios are also referred to as key figures or key ratios both in this workand in the literature. The summary statistics and figures are obtained by usingthe complete datasets.

1Financial statements are reports which provide an overview of a firms financial conditionin both short and long term. Financial statements are usually reported annually and splittedinto two main parts, first the balance sheet and secondly the income statement. The balancesheet reports current assets, liabilities and equity, while the income statement reports theincome, expenses and the profit/loss of the reporting period.

40 Data Resources

4.2.1 Liquidity Ratio

The liquidity ratio is a financial ratio that is used as a measure of liquidity.The term, liquidity, refers to how easily an asset can be converted to cash. Theliquidity ratio in equation (4.1) consists of current assets2 divided by currentliabilities3 and is thus often referred to as the current ratio. The liquidity ratiois considered to measure to some degree whether or not a firm has enoughresources to pay its debts over the next 12 months.

liquidity =Current Assets

Current liabilities(4.1)

The liquidity ratio can also be seen as a indicator of the firms ability to avoidinsolvency in the short run and should thus be a good indicator of creditwor-thiness. By considering the components of equation (4.1), it can be seen thata large positive value of the current ratio can be seen as a positive indicator ofcreditworthiness. In the case that the current liabilities are zero, it is consideredas a positive indicator of creditworthiness, and the liquidity ratio is given theextreme value 1000. In Table 4.3 the summary statistics of the liquidity ratiocan be seen for all sectors and each individual sector.

Statistics All Sectors Real Estate Trade Production Service Transport

Min. 0.65 0.09 0.01 0.65 0.00 0.001st Qu. 0.83 0.14 0.94 0.83 0.53 0.47Median 1.11 0.62 1.19 1.11 0.97 0.69Mean 1.26 2.31 1.53 1.26 1.57 0.863rd Qu. 1.46 1.58 1.60 1.46 1.48 0.99Max. 25.64 275.50 37.21 25.64 91.80 10.54

ev(1000) 0.95% 2.48% 0.78% 0.22% 0.37% 0.0%

Table 4.3: Summary statistics of the Liquidity ratio, without the 1000 values.The rate of observed extreme values, ev(1000), is also listed for each sector.

As can be seen in Table 4.3 by looking at the median and first quarters the realestate sector has the lowest liquidity ratio. The transport sector also has lowliquidity ratios. The liquidity ratio for all sectors and each individual sector canbe seen in Figure 4.1.

2Current assets are cash and other assets expected to be converted to cash, sold, or con-sumed within a year.

3Current liabilities these liabilities are reasonably expected to be liquidated within a year.They usually include amongst others, wages, accounts, taxes, short-term debt and proportionsof long-term debt to be paid this year


The liquidity ratio will simply be referred to as the liquidity whereas it measuresthe firms ability to liquidating its current assets by turning them into cash. Itis though worth noting that it is just a measure of liquidity as the book value ofassets might be considerable different to its actual value. Mathematically theliquidity will be referred to as l.

4.2.2 Debt ratio

The Debt ratio a key figure consisting of net interest bearing debt divided bythe earnings before interest, taxes, depreciation and amortization (EBITDA)4.The Debt ratio can be calculated using equation (4.2) where the figures areobtainable from the firms financial statement.

Debt

EBITDA=

Net interest bearing debt

Operating profit/loss + Depreciation/Amortization(4.2)

Where the net interest bearing debt can be calculated from the firms financialstatement and equation (4.3).

Net interest bearing debt = Subordinary loan capital + long term liabilities

+ Current liabilities to mortgagebanks + Current bank liabilities

+ Current liabilities to group + Current liabilities to owner, etc.

Liquid funds Securities Group debt Outstanding accounts from owner, etc.

(4.3)

The Debt ratio is a measure of the pay-back period as it indicates how longtime it would take to pay back all liabilities with the current operation profit.The longer the payback period, the greater the risk and thus will small ratiosindicates that the firm is in a good financial position. As both debt and EBITDAcan be negative there are some precautions that have to be made, as it has twodifferent meaning if the ratio turns out to be negative. In the case where thedebt is negative it is a positive thing and should thus be overwritten as zero ora negative number to indicate a positive creditworthiness. In the case where theEBITDA is negative or zero the ratio should be overwritten as a large number toindicate poor creditworthiness, in the original dataset these figures are -1000 and1000 respectively. In the case when both values are negative they are assignedthe resulting positive value, even though negative debt can be considered as amuch more positive thing.

4Amortization is the write-off of intangible assets and depreciation is the wear and tear oftangible assets.

42 Data Resources

Liquidity

Den

sity

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8 All Sectors

LiquidityD

ensi

ty0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8 Real Estate

Liquidity

Den

sity

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8 Trade

Liquidity

Den

sity

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8 Production

Liquidity

Den

sity

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8 Service

Liquidity

Den

sity

0 5 10 15 20 25 30

0.0

0.4

0.8

Transport

Figure 4.1: Histogram of the liquidity ratio for all sectors and each individualsector, the figures shows a refined scale of this key figure for the complete dataset.


The overwritten values have to be carefully selected in order to prevent thatthe regression will be unstable. Histograms of the Debt ratio for all sectors andeach individual sector can be seen in Figure 4.2. The 1000 values make it hardto see the distribution of the other figures and are thus not shown. As can beseen in Figure 4.2 the debt ratio is different for different sectors, especially inthe real estate sector. There the ratio is on average larger for the real estatesector than for the other sectors. In order to get an even better view of this keyfigure summary values for all sectors and each individual sector can be seen inTable 4.4.



ev(1000) 6.73% 6.58% 6.50% 6.41% 8.61% 2.74%ev(-1000) 5.17% 4.23% 4.16% 5.28% 7.79% 2.74%

Table 4.4: Summary of debt/EBITDA, for all sectors and each individual sec-tor, without figures outside the 1000 range. The rate of the extreme valuesev(1000) and ev(-1000) for each sector is also listed.

From Table 4.4 it is clear that the real estate sector has considerable largerDebt ratio than the other sector which are all rather equal. The inconsistencybetween sectors has to be considered before modeling. Mathematically the Debtratio will be referred to as d.

4.2.3 Return on Total Assets

The Return On total Assets (ROA) percentage shows how profitable a com-panys assets are in generating revenue. The total assets are approximated asthe average of this years total assets and last years assets, which are the assetsthat formed the operating profit/loss. Return On total Assets is a measure ofprofitability and can be calculated using equation (4.4) and the relative compo-nents from the firms financial statements.

ROA =Operating profit/loss

12 (Balance sheet0 + Balance sheet1)

(4.4)

44 Data Resources

Debt/EBITDA

Den

sity

0 20 40 60 80 100

0.00

0.05

0.10

0.15

All Sectors

Debt/EBITDAD

ensi

ty0 20 40 60 80 100

0.00

0.05

0.10

0.15

Real Estate

Debt/EBITDA

Den

sity

0 20 40 60 80 100

0.00

0.05

0.10

0.15

Trade

Debt/EBITDA

Den

sity

0 20 40 60 80 100

0.00

0.05

0.10

0.15

Production

Debt/EBITDA

Den

sity

0 20 40 60 80 100

0.00

0.05

0.10

0.15

Service

Debt/EBITDA

Den

sity

0 20 40 60 80 100

0.00

0.05

0.10

0.15

Transport

Figure 4.2: Histograms of Debt/EBITDA for all sectors and each individualsector, in a refined scale. The 1000 values are not shown.


In equation (4.4) the balance sheets5 have the subscripts zero and minus one,which refer to the current and last years assets, respectively. For firms that doonly have the current balance sheet, that value is used instead of the averagevalue of the currents and last years assets. Return on assets gives an indication ofthe capital intensity of the firm, which differs between sectors. Firms that haveundergone large investments will generally have lower return on assets. Startup firms do not have a balance sheet and are thus given the poor creditworthyvalue -100. By taking a look at the histograms of the ROA in Figure 4.3 it isclear that the transport sector and especially the real estate sector have a quitedifferent distribution compared to the other sectors.



ev(-100) 6.49% 5.01 % 5.90% 8.20% 5.86% 4.11%

Table 4.5: Summary of Return On total Assets

As can be seen from Table 4.5 the ROA differs significantly between sectors.The mean values might be misleading and it is better to consider the medianvalue and the first and third quartiles. It can be seen that the transport andreal estate sectors do not have as high ROA as the others which can partly beexplained by the large investments made by many real estate sector firms. It isalso observable that the first quartile of the service sector is considerable lowerthan the others indicating a heavier negative tail than the other sectors.

4.2.4 Solvency ratio

Solvency can also be described as the ability of a firm to meet its long-term fixedexpenses and to accomplish long-term expansion and growth. The Solvency ratiois also often referred to as the equity ratio, consists of the shareholders equity6

and the balance sheet, obtainable from the firms financial statement.

Solvency =Shareholders equity

Balance sheet(4.5)

5Balance sheet=Total Assets=Total Liabilities + Shareholders Equity6Equity=Total Assets-Total Liabilities. Equity is defined in Section 4.5.

46 Data Resources

Return

Den

sity

100 50 0 50 100

0.00

0.04

0.08

All Sectors

ReturnD

ensi

ty100 50 0 50 100

0.00

0.04

0.08

Real Estate

Return

Den

sity

100 50 0 50 100

0.00

0.04

0.08

Trade

Return

Den

sity

100 50 0 50 100

0.00

0.04

0.08

Production

Return

Den

sity

100 50 0 50 100

0.00

0.04

0.08

Service

Return

Den

sity

100 50 0 50 100

0.00

0.04

0.08

Transport

Figure 4.3: Histograms of the Return On total Assets for all sectors and eachindividual sector.


The balance sheet can be considered as either the the total assets or the sum oftotal liabilities and shareholders equity. By considering the balance sheet to bethe sum of total liabilities and shareholders equity the solvency ratio describesto what degree the shareholders equity is funding the firm. The solvency ratiois a percentage and ideally on the interval [0%,100%]. The higher the solvencyratio, the better the firm is financially.

By viewing Table 4.6 it can be seen that the minimum values are large negativefigures. This occurs when the valuations placed on assets does not exceed lia-bilities, then negative equity exists. In the case when the balance sheet is zero,as is the case for newly started firms, the Solvency ratio is given the extremelynegative creditworthiness value of -100. To get a better view of t

Date post:	04-Feb-2018
Category:	Documents
Upload:	lythuan
View:	219 times
Download:	0 times

Credit Risk Modeling - Technical University of Denmarketd.dtu.dk/thesis/224338/ep08_100.pdf · As...

Documents