What's Hot

Nassim Nicholas Taleb's blog, an inspiring read | Incerto

Sunday, January 23, 2011

Advantages of Bayesian Credit Scoring Models

At a top level credit risk analysts have to solve three problems across both their banking and trading books to fulfill many of their goals.

The first is being able to calculate the amount of notional exposure, leverage and potential loss that may occur from positions clients take with the bank. These calculations are carried out at a trade and portfolio level referencing netting and non-netting offsets.  This is referred to as the Exposure at Default (EAD).

The second problem credit analysts need to be able to quantify is recovery rates from these exposures and to take total losses divided by the Exposure at Default to create a Loss Given Default (LGD) value.  The LGD is also in some respect equivalent to (1 – (Recovery Value/Present Value of Future Cash Flows)).

Click to read more about Probability of Default, our third variable.

The third is the ability for credit analysts to be able to estimate the Probability of Default or the likelihood that a counterparty will fail to meet their contractual obligations with the bank at some point in the future.   We are aware of course that these three variables are joint probability distributions, which can combine to result in say the situation where higher exposures drive larger repayments and in themselves may lead to an increase in Probability of Default, but that is an internal calculation.

Nearly all credit or credit like initiatives are reliant on being able to dimension the space these three parameters operate in and credit analysts must be able to convolute how these three variables work together.  Intact, there are many functions in a bank which are reliant on effectively capturing these variables, from portfolio duration / return, to product pricing and Basel II regulatory capital.

After the initial shock of the credit crisis subsided, a crowd of economists, analysts and regulators swooped in to try to understand what drove this systemic problem.  The outcome from their investigations was numerous but brought the models used by some rating agencies into the spotlight and regulators generally classified lending behaviour for subprime clients with poor credit scores as ill-intentioned.

The time to fix credit scoring and rethink many credit risk methodologies is great at present.  In this article we are going to look at our third credit risk variable, the Probability of Default.

Quantifying Probability of Default
Probability of Default can be estimated in many ways.  In the investment banking domain measures of Probability of Default are achieved by looking at the grades from rating agencies.  Credit default swap prices show a premium for risk and credit bond spreads which is the yield/price relationship between the issuer of the debt and that of the risk free rate are also popular measures for default.  The spread approach works by showing investors and lenders how much greater the return is for a specific corporate bond when it is compared against a treasury note issued by the government.  The later should be risk free as governments shouldn't default however in this new world of multiple sovereign defaults, that calculation in itself needs review.

In the world of corporate banking the credit spread approach is used less because many companies may not be rated or have issued bonds.  In this case, analysis of specific ratios on the borrowers cash flow statement, income statement and balance sheet tend to feature much more in the credit analysis.

Additionally in the corporate lending market other tests are also carried out using credit scoring models.  These scoring models generally comprise of specific ratios which are gathered from the balance sheet of a potential borrower and allow the bank to compare characteristics of good and bad borrowers with the characteristics of a new borrower.   Ratios such as the percentage market value of equity, over the book value of total liabilities for one borrower is compared with a similar ratio from companies on the banks lending book.  The logic behind this is to draw conclusions such as borrower A has this ratio value and defaulted, borrower B had this ratio and defaulted, our new borrower also has this ratio value, will it default?

Credit Scoring Goals
Using our example above and in the end; if the bank is able to say with a level of confidence that the chance of default when the value of equity falls below 20% is X for a new borrower, then the lender is able to achieve three things.

[1] Decide to lend to the borrower or not - is the risk too high and out of policy to begin with.

[2] Charge a premium for the loan, if it is higher risk it should have a higher beta factor and higher repayments or collateral cover is due from the borrower. Makes sense as riskier deals deserve a better reward however, there is an anti-cycle to this process but we are not going to debate that here.

[3] The bank can also estimate how much they should potentially reserve for losses from a default from the new borrower by incrementally adding it to a portfolio of existing loans.  This third value will eventually feature in the regulatory capital charge that is defined under the Basel accord.

In the end, the idea of the scoring model and its results becomes central to the credit decisioning process and in retail banking it is the most prominent factor for deciding "to lend or not to lend" as the question goes, sorry for the pun.

Complexity of scoring models
Credit Score Models have many issues as do most things from a complex world.

Firstly they can be circuitous calculations and thus create this huge divide of understanding between the portfolio analytics team and the front office.   Secondly they can suffer drift which occurs when a specific ratio or factor stops being a good indicator of default.   Finally they are used for many initiatives in the bank as we have indicated.  Depending on the way the internal scoring calculation is carried out; they may be ideal for supporting regulatory capital calculations but become less useful for a lending decisions. This is believable as one requirement relies on a continuous probability function, the other is a binomial decision.  Many credit departments simply end up building several models from various datasets they capture, this is all done in an effort to meet the diverse business problems they have to solve.

The most common credit scoring systems in use are those based around multivariate models such as Logistic Regression on one end of the scale and occasionally Stepwise Methods at the other end of the scale.

The Bayesian Technique
One method I have been researching of late is a Bayesian network approach for aggregating factors in credit scoring models, especially those models that are predominantly used by credit teams for corporate and retail lending.   Interestingly Bayesian methods would seem on the surface well suited to scoring exercises because Bayesian methods build up a network tree of factors that can be connected easily to illustrate a condition such as default.

Mind you, not that many practitioners out there seem to be using Bayesian networks directly or if they are, they are an annex to the main Probability Default model.  I fair this is due to several hurdles that have to be overcome.  These modelling difficulties include how do we turn a tree like structure of factors into a continuous probability distribution function that is inline with Basel requirements and how do we know whether our nodal network is not deterministic illogic. 

Deterministic illogic occurs when the analyst connects factor A to factor B in the belief that a hypothetical default may occur from the interaction of these specific variables but they might be wrong entirely, meanwhile the output is always the same. It could be factor C which is driving random default, consequently the credit scoring system is faulty as factor C has been omitted.  Worse the interactions between factor A and B aren't implied in the correct way in the model to begin with as the implied causal relationship has drifted overtime. Factor A may infer factor B in an additive or multiplier way but change. There may actually be no causal relationship at all between these factors and both are randomly independent. This is testable with a small number of factors but as the number of variables begin to increase so does the complexity of the problem.

Overcoming Hurdles
Both these problems can of course be overcome, the use of statistical methods such as Markov Blankets allows for a wider range of possibilities to be considered so that a learning network is created instead of a deterministic network.  Approaches such as the "Normal Mixture Distributions" described by Martin Neil and Norman Fenton from Agena can be used to create distributions of outcomes rather than exact results.  From research that has been carried out on the internet, another paper from Fair Issac is worth taking a look at especially as this company builds credit scoring systems for banks.  Interestingly, it also makes mention to these conundrums and investigates solutions to address these Bayesian modeling difficulties.

Bayesian networks have many advantages even with these hurdles.  They are graphical in nature and visually show to the front office and management why a specific score is important for capture and also how it is used in the model.  They can also resolve the problems of drift which reduces the dependency of having to weight scoring factors in the model at least every year.  Setting scoring weights is a painful process fraught with error.

An introduction presentation on this subject has been attached to this blog and over the coming months further research will be carried out on how to create a scoring model that takes advantage of the best features of Bayesian network principles.
Feedback and Update 
Posted by Martin Davies on 27th January 2011 

Some bankers and risk analysts that have read this posting, have taken to email me various comments both positive and expressing the difficulties with implementing Bayesian networks for credit scoring systems.  I was quite impressed with the feedback, especially as it was generally encouraging.  Questions and specific comments varied to some degree but I have taken to summarize six key areas which seemed to be common.

I believe I am going to need to do another article just to address all of these alone and I will plan that in the coming days or perhaps weeks.

Adding New Nodes
Score carding models are not going to be static and overtime new factors will be identified which will need to be added to the model, how can this be achieved with a Bayesian network? What are the impacts on previous scores?

Calibration and Policy
How is policy captured in the scoring system? How are changes to policy tracked overtime and will this impact the model calculation?

What to score
What should actually be scored and how should the scores be represented in the system?   The article above states that specific variables are captured off the balance sheet of a firm, can some examples be given and is it possible to show how these are mixed in the Bayesian model.

Continuous Values
It appears that Bayesian networks are binary in nature, how are continuous factors captured and aggregated into the model? Surely some factors will simply return a specific number and how do you score that?

Unique Industry Types
Will unique industry types have different ranges for specific factors?  Manufacturing firms may score poorly on one factor which may be a default problem for this industry sector but not for another industry sector.

No Data
When the model is commenced it will have no data stored within it, how is this issue overcome?  How many variables need to be captured before the Bayesian model becomes useful?

Finally a special thanks to Eduardo from Madrid who recommended reading the following books:

  [] Bayesian methods in Finance - Fabozzi
  [] Coherent Stress Testing – Rebonato

No comments:

Post a Comment