A short story or the sequel to Bayesian modelling is an extension of my first article on the subject which can be found here. This first presentation seemed to generate a lot of interest and questions from readers across the planet and I wanted to address these queries by extending the article further. Feel free to download the sequel presentation and I promised myself this was not to take more than a few hours to pull together so in places it is a tad grainy, even with the occasional hand written formula. If you would like the PowerPoint presentation for clarity sake, just touch base on linked-in and I will be more than happy to send it through to you as a presentation in email.
So what variables should we capture to measure default in a borrower, click to read more.
What to capture
It is all fine to discuss Bayesian methods or logistic regression for modelling probability of default but one of the key points of interest is; What should we actually capture to begin with and well before we even consider how to model it?
We start to solve this problem by looking into DuPont analysis for economic measures of a firm. DuPont attempts to bring health factors from the balance sheet, cash flow and income statement for a firm into a set of ratios. These are tree like in nature and ideal for modelling in a Bayesian manner. With a bit of additional work, it is straightforward to extend on from DuPont and in list below there are fifty or so variables that might be useful indictors for default.
Some of these have a closer proximity of predicting default than others of course however the Bayesian model itself will take care of how these variables are mixed. In all cases it is important to capture ratios rather than single numbers for several reasons:
- Does the size of a balance sheet really matter as a measure of default, just for example. In the backdrop of a world where there is now no too big to fail ideals out there, balance sheet size and market capitalization are meaningless numbers.
- Numbers that aren’t ratios, lack relationships or are difficult to compare, will not help us with the model. For example 200 small debt obligations on a firm does not infer that it is going to default when one single large line of debt may bring the entire business into liquidation.
Precursor to the model
Like any problem, if you think about it for long enough even if that is only for a few hours, specific ideas and mechanisms come to mind which deserve further investigation. Understanding and then preventing default on a debt obligation is predicated on a banker being able to answer this single question - Can the revenue line (cash inflow) support the repayment schedule for debt after other costs?
To address this question three sub models have been created; One that is dubbed Cash Flow Frequency Lambda, the other; Volatility of Free Cash Flow for Firm and the last the J-Curve slope. The model we have here is based on corporate banking but these three measures of default could be converted into default factors for retail lending quite easily.
Cash Flow Frequency Lambda highlights the number of revenue channels a firm supports over an investment horizon. It works on the belief that an entity with a small number of large cash flows that are spread disparately and widely throughout an earnings period is potentially high risk.
Volatility of Free Cash Flow for Firm is another measure of how much Free Cash resides over a revenue period after supporting payment commitments including costs and debt obligations for the firm. The higher the spread of free cash (a pool of reserve left over after all costs are serviced), the larger the propensity a firm has to absorb months of poor sales volumes or volatile costs or long periods when interest rates increase.
We have to remember however that we need to express these numbers as measures or "intrinsic value" (a measure contained in the item itself) not as a true absolute number otherwise we can’t benchmark or model them.
For the Cash Flow Frequency a quick solution was to fit this to the Poisson distribution on the assumption (which can be proven by the output of the model) that each industry will have a standard expected cash flow frequency and that a firm should sit within a band for its industry sector. The Poisson distribution has been selected because it uses a single curve estimate as an expected number, in this context that number or lambda estimate translates to the expected occurrence of revenue. For the FCFF spread which is the amount of head room of free cash between revenue, expenditure and debt, we also need an "intrinsic value". FCFF spread by itself however will be a continuous number that could be anything from zero to a billion or more, so in this case we model the standard deviation squared of the FCFF headroom which has been separated out into specific bins of accumulation overtime to give us a single expression or metric for risk. The FCFF headroom in this sense has been translated into a proportion of holding today for servicing debt tomorrow.
A key point to note about these potential measures of default is that they are useful to a credit analyst whether that analyst is using a Bayesian or a Logistic Regression model for calculating probability of default.
What about Leverage
Another factor that should be included in the model is financial leverage. Firms that have high amounts of leverage will inherently be more risky and consequently present increased levels of default.
One expression of leverage is to measure the Debt to Equity ratio. Another measure of implied leverage is to review the steepness of the J-Curve. The J-Curve shows the depth of borrowing an entity might take on. It indicates the breakeven point which is where the firm becomes cash positive and the danger zone through which the firm must continue to service its debt obligations.
Both the Debt to Equity ratio and the steepness of the J-Curve are possible proxies for default and both numbers can be expressed as an "intrinsic value" or metric which is suitable for modelling.
Moving towards the model
A key hurdle highlighted to me by a couple of readers from the first Bayesian article is that Bayesian models may over fit the data. I looked at several Bayesian models used out there in the industry to solve this problem and then stumbled across one developed by a UK firm known as Quintessa. The Quintessa models are not applied to banking per se but the company had an interesting approach of modelling which might be very useful for assisting in our Probability of Default problem. The Quintessa model uses a theory known as evidenced based logic to insert unknown randomness into the system.
In the realm of credit modelling the viability of the model is tied inevitably to factors that are captured from a firm’s financial statements. These factors may not always be transparent especially when projections are used from market data that is inconsistent. Some firms may not have been in business long enough to generate a valid set of figures and other firm’s economic factors might be missing because they are not a listed entity. The list of issues driving data paucity concerns is multitudinous to say the least.
The Evidenced Based Logic approach solves this problem by allowing us to capture known knowns, and leave randomness for known unknowns and the worst of all unknown unknowns. Gosh that is a mouthful. In addition to this, the approach allows for weighting of questions by setting a completeness and quality of data parameter, the model was certainly elegant enough that it was worthy of a write up.
In the presentation attached, we discuss all of these agendas, what factors to capture for modelling default, how these factors should be presented to the model, how an Evidenced Based Model may actually work and the system architecture top level schematic that could be a consideration for bringing the entire service into operation.