## Saturday, April 7, 2012

When modelling enterprise risk outcomes, analysts need to consider the correlation between variables in their algorithms. If they don't, the potential loss estimates they generate from these calculations are likely to be extremely erroneous.

A recent linked-in discussion on the dependency, correlation, causality and mutuality of multiple risk factors has opened up an interesting debate on the subject and stimulated this blog post. Additionally, after speaking with several risk analysts on the subject of factor dependency, there also appears to be a genuine interest in putting to word how to model an aggregate level of risk which is sensitive to correlation.

In this article we review a very straight forward method for measuring correlation in risk variables and for propagating a final outcome.  We also show why the process under CAPM is flawed.

Why is inference important
A fortnight ago, we investigated how Monte Carlo can be used to build a final probability loss distribution function. The output in this case was achieved by combining the potential frequency of a risk event with its associated magnitude and more information can be found here [ Link ].

A standard Monte Carlo process is fine for generating non-deterministic randomness but it has a few problems. This is especially the case when the process ignores correlation among the model factors.

When risk factors are combined, the outcome needs to consider correlation between factors because one risk factor might offset a different factor and reduce the final potential loss. We call such offsetting a hedge, but what's worse is that the two factors shown above (n1, n2) might actually compound with each other and in a positive manner to increase the final exposure shown in the n3 loss function. Either way, over and underestimating the result of an unwanted event at an analytic level, seems to be a typical curse risk modelers tend to suffer from and lack of correlation inclusion in the calculation is often the culprit.

Correlating n1, n2
In our simple example above we have two random variables that may or may not have correlation and the covariance can be defined by our Cov(n1,n2) formula below. The correlation is being identified through the Corr(n1,n2)=p(n1,n2) function.

Assuming a 73% correlation between the two random variables in our example is fine, but we have to also have to accept that there are some conditions which may nullify our simple calculation. I have taken to list three potential issues but there are many more.

[1Correlation doesn't equal causality ~ Two factors may have correlation but that doesn't imply one factor is driving another. There could be an additional condition which is in play that has been omitted from the model and we can bet our bottom dollar on the fact that this additional condition is probably also randomly distributed.

[2Correlation can have time lag ~ Correlation is rarely instantaneous and there can be a delay before one factor aligns with another and you can probably guess that this time lag too, is likely to be randomly distributed.

 Convoluted Factors have multiple correlation dimensions ~ Correlation changes overtime, and between samples. Concisely put; correlation itself is not continuous but in fact random throughout the sample of random variables or to put it differently, correlation is not straight line as one moves through the ranges of n1 and n2. For low values of n1 for example, correlation may be negative and then become positive as n1 grows.

The Covariance / Square root example we listed above for measuring variance and correlation in a convoluted manner has to work surely?

I hear a lot of people say that it has to work because it is used everywhere.

Yes it does work because an additional data point in either the n1 or n2 data series would impact the entire variance over both samples and the correlation factor would change as a consequence of this but, there is a major oversight we are taking for granted by working out the impacts of correlation in this way. Capturing the entire sample of data points in your correlation equation forces a summation or average correlation across ranges of n1 and n2. There is a huge amount of error with averages especially when a market is moving yet, CAPM is based around this formula and is consequently flawed [ CAPM LINK ].

The philosophy under the Capital Asset Price Model is actually appropriate but the way the model is applied is flawed and many institutions which use CAPM seem to have fallen into this trap.

Cholesky Decomposition
There is a solution to the multiple correlation dimension problem which is to assume that correlation between factors is in fact a random variable or a matrix of positions across and against the ranges of n1 and n2.

If the variance-covariance matrix is a positive definite, then correlation across the matrix can be identified and in a straight forward manner. This decomposition of the correlation vectors for multiple factors should also be calculated before extended calculations such as Monte Carlo simulation are entertained.

The covariance matrix contains the implied correlation structure and the volatility vector but it should be thought of as being decomposed into a lower triangle (L) and an upper triangle which is actually a mirror, where both the zero elements and non-zero elements are captured from the lower triangle and inverted. Given we have a lower (L) and an upper (U), we can solve the middle elements or the diagonal result and that ends up being the product of the original covariance matrix.

Cholesky isn't so hard
The great news is that Cholesky decomposition is very easy to implement. It can be calculated in a spreadsheet using a VBA macro or in MATLAB or R-Project.

Here is an example Visual Basic for Applications Macro in excel:

Going on from this point, Cholesky Decomposition is actually the tip of the iceberg when it comes to correlation and there are many other statistical approaches which are worthy of consideration by a good risk analyst.  Cholesky Decomposition itself can also be extended of course but either way, risk analysts need to factor the impact of correlation on the risk factors they are modelling. That much is imperative.