What's Hot

Nassim Nicholas Taleb's blog, an inspiring read | Incerto

Friday, February 15, 2013

Two schools of thought for measuring op risk

I have been reading a lot of posts on many different risk forums of late and it appears that the Frequency x Magnitude argument for measuring operational risk, is a stubborn pandemic and a failing that is going to be nearly impossible to move on from.

How do we really measure operational risk?

Measuring Operational Risk
This Frequency x Magnitude method is tragically erroneous because it has no variance or uncertainty within it.  All things from an uncertain world are variant, if they aren't variant then one has to question whether there is any risk in them in the first place.

If we go by the ISO 31000 standard definition of risk, risk is uncertainty in objectives and uncertainty can loosely be translated to variance.

These things are best explained with an example: The temperature is an average of 14 degrees Celsius today, this seems quite comfortable doesn't it or does it?  If the temperature is -14 degrees at midnight and +42 degrees at midday, we might only end up being comfortable for a few hours of the day.

This stationary variance at different times of the day is going to present different problems or risk for us. However, if we assume that the temperature is only 14 degrees Celsius, we grossly underestimate the hazards or risks in our environment when we are dealing with a '+42' end of the spectrum.  We have discussed this Frequency x Magnitude curse before on this blog [ LINK ] and today, we need to extend on from this thinking because there is little to yield from it.

So then, forget F x M and back to the original question; How do we really measure operational risk?

Figure 1 : The Direct Method | Causal Capital   [Click image to enlarge]

As it happens, there are only two authentic ways of measuring operational risk and I know everyone is going to say; come on, there are infinite ways to skin a cat and you're right but let me swing in a category definition, just for the sake of this debate and then you will see where I am coming from.

Operational Risk can only be measured DIRECTLY or INDIRECTLY with the occasional crossover, these are the two pure worlds of op risk measurement. This may sound ridiculous, such as in 24 hours we have 1440 minutes but lend me a second (no pun intended) and you will see there is more to this than meets the eye.

Let's talk about DIRECTLY measuring operational risk first because it is the most obvious and logical way for estimating exposure.  Directly doing anything is a perceivable way forward, yes?

In effect, you observe something, a system, a business unit, a month of project operation, whatever and you capture loss event or incident information.  This type of real observation tells you quite a lot about what is going on, you can see how frequently something occurs and how big or small the financial outcomes are for each single incident of risk.

It follows then that you will end up with two distributions of losses; a discrete probability distribution function that describes the frequency of occurrence for an incident type and a continuous distribution which dimensions how large these occurrences may become. I must emphasize that these are probability distribution functions, not single points in time, so they can't be multiplied together!

If you follow the process through that is described in Figure 1, you will be able to create a final probability distribution function of potential outcomes and you can also use concepts such as Extreme Value Theory to extend this risk measurement further so that you can estimate tail threats.

What seems easy is often not so
In the last three paragraphs, I have described how to measure operational risk DIRECTLY and it seems easy doesn't it. Nevertheless, there are a few catches which trip us up in the world of directness.

Let me elaborate on three of them:

[1] What happens if we have only observed a few data points or loss incidents?
In this case, our frequency and magnitude distributions will be "chunky" and full of measurement error, in effect; our measurement will be incomplete.

[2] This direct method for measuring risk is backward looking, it assumes what we have quantified yesterday will be the expectation tomorrow. That is a stationary risk problem and very few real life risk situations are stationary but perhaps multivariate through time. We have to accept that the world we inhabit is likely to be changing around our stationary variance and may actually be impacted by what we are measuring; this is what some call feedback loops in the system.

[3] What is most frustrating for a risk manager is that this DIRECT measurement approach will not tell us why we have these risk incidents in the first place, we just know that we have them.

Indirect ways of doing things
Alternatively, on the other side of the world, the dark side of operational risk measurement does things differently or INDIRECTLY as it is and it is fascinating, perhaps more powerful than our purest view of quantifying observation DIRECTLY. Additionally, INDIRECTLY measuring operational risk employs a wide range of management structures and statistical techniques that all have their own benefits and disadvantages.

Figure 2 : The Indirect Method | Causal Capital   [Click image to enlarge]

So operational risk can be measured DIRECTLY or INDIRECTLY and the indirect method has two popular techniques that are commonly employed by operational risk analysts today, they are; Latent Causal Modelling and Scenario Analysis.

[1] Direct Op Risk Measurement

     1.1 - Observation, event data capture, curve fitting, testing and confidence setting

     1.2 - Extreme value theory and tail smoothing

     1.3 - Tradable price signals

[2] Indirect Op Risk Measurement

     2.1 - Latent Causal Modelling
Understand how the relationship of something we can observe, leads to the outcome of something we can't observe so easily. 
Correlation and Dependency are complex areas of statistics taking in techniques such as Bayesian networks, random forests, logistic regression, multiple discriminant analysis, Partial Least Squares Path Modelling and many other models which allow a risk analyst to describe how a key risk indicator leads to a loss incident.
     2.2 - Scenario Analysis Modelling
Take what we do know and extend on from this, extrapolate beyond the tail and show how events may correlate or cluster together. 
Figure 3 : The Hybrid Approach | Causal Capital   [Click image to enlarge]

The more superior operational risk measurement systems actually select aspects from both the DIRECT and INDIRECT risk measurement worlds. They are a hybrid risk measurement system that can be coherent and forward looking. A hybrid scenario-loss data system or 'a combo' is fantastic, it is where we want to be but it is also very difficult to design a model for. I believe the world of operational risk in general, would benefit from more analysts researching this path of risk measurement.

Figure 4 : A simple Indirect Example | Causal Capital   [Click image to enlarge]

On the other hand, when it comes to latent causal modelling, you will find that this is a very fast moving or evolving field of risk measurement and some great work has been done over the years. Risk analysts can do well today by looking at other disciplines of science where causal statistics are being applied heavily and fields such as engineering or genetics can be insightful. Additionally, software programs of the likes of R-Project are making these complex statistical models very accessible to risk analysts.

In my opinion and away from risk generalists who often suffer from model xenophobia, the operational risk analysts that are cutting it today are chasing hybrid or latent causal techniques. Mind you, I believe it could be a while off yet before research done in this exciting end of risk measurement, is adopted into the generalist perspective of mainstream mediocrity that many risk departments are trapped in.


  1. Very interesting Martin, thank you.
    I would definitely support the indirect method, although I would call it causal method, or drivers method, as it focuses more on the loss generating mechanisms (LGM) rather than recorded data and possibility naive replication of the past. Combining mature and well know statistical techniques with risk management expertise is the way forward, in my view as well, for better operational risk modelling.

  2. Ariane,

    You are right, this is a causal method or drivers method and when I wrote this article, I did have RDCA (Risk Drivers Control Approach) in mind.

    In the past, even ten years ago, concepts such as RDCA (risk drivers) method at an applied level was out of reach for many risk practitioners because software tools just weren't around to support this type of modelling. The concepts might have been understood but they were difficult to investigate in a systematic manner.

    That is changing today and I am sure we are going to see some exciting new systems emerging in the practice of risk management in the years to come.

  3. Direct risk measurement is a method to understand loss events, frequency of loss events and loss values associated with loss events.

    Latent casual modelling is a method to understand operational variable. This is done by discovering relationship between cause and outcomes.
    Scenario analysis is method to understand what can go wrong and arrive at possible strategies and solutions.

    The constant in real world is change. Changes over a period leads to change in the environment itself. In how much time, environment will change is a function of change rate. What is the rate of change and who are driving the change requires to study evolution science and is a subject of academic research. Let us leave this for GOD to take account.

    What an I saying, when current subject is how to measure operational risk.

    We can say there is higher risk, when there is high frequency of loss events and loss values are on increasing trend. The loss values changes with the the change of operational environment like introducing new products, adding new procedure, retiring old procedure etc. Let us leave to the risk manager to create categories of risk to suit to its risk tolerance and defect rate tolerance.