Causal Capital: Automation and Risk

We have been a little quiet on the blogging front at Causal Capital and a lot of people have asked; where's the team, what's going on? ... but it's been a busy, crazy year of growth for the company ~ We have tippled in size in less than twelve months and sadly we have stepped back from blogging to accommodate so much change, but rest assured this is only for a while.

Hopefully, we will be able to launch back into our publishing frequency for 2020 with various exciting articles, white papers, models and who knows, a few videos thrown in for fun ... certainly a new website is in order but more on that later.

Today we are going to explore automation and it's relationship on operational risk.

Boeing Catastrophe

The interrelated Boeing Catastrophe in Indonesia and then Ethiopia had a significant impact on me as a Risk Analyst, especially as I am traditionally a believer in the idea that automation standardises an operating environment and as a consequence, this should reduce risk from erroneous pathogens.

What happened at Boeing was quite the opposite and a brief peruse of Redline on the Verge by Darryl Campbell makes for a sobering chronological dialogue. The average person is left feeling that less human intervention or precisely oversight, contrasts us away from safety to an environment that is not only profoundly hazardous, but also capable of supporting unmanaged fatal scenarios that may be unbelievably common.

As such, this Boeing disaster is fueling the debate in various forums, even if that battle is to save jobs from the jaws of the growing world of Artificial Intelligence, that automation has an untenable relationship with risk management.

Redline on the Verge | Darryl Campbell [LINK]

The lead up to the Boeing fiasco is a complex longitudinal array of corporate dysfunction, no doubt. Still, it does not detract from the fact that the operating environment of the 737 Max cockpit was rendered to be unworkable for humans and that leaves the airplane inherently 'unairworthy'.

The Boeing Max saga is not alone in its class of automation failures that may be cataclysmically ruinous. Another illustration of unabated complexity in the User Interface raking havoc on oversight or specifically, operator spatial disorientation, is the collision of a US guided-missile destroyer with an oil tanker in the Singapore Strait. This incident killed ten sailors and injured forty more and again the control environment fell under scrutiny.

Tactile Controls are back in favor | USNI News [LINK]

Operator Spatial Disorientation is a failure of risk control that is increasingly being owned by those who design systems as much as those who operate them. Causal Capital is currently running a project on UX UI design effectiveness which should allow us to share some learning bites on this fascinating area of risk management in the near future ... but in my opinion, Spatial Disorientation is only one causal pathway that needs recognition.

There is a whole array of potential risk sources for catastrophe that seem to be founded in the design studio that is worthy of ruinous red flagging, here are my top five deeply concerning automation hazards that I believe need rudimentary control.

[1] Built In Redundancy - A major causal factor in the Boeing Max disaster can be found in the Maneuvering Characteristics Augmentation System which automatically responded to its environment based on a data signal from ONLY ONE source. There was no redundancy, no secondary signal to rectify a comparison, nothing to test for False Positives, and there was no possibility to develop a quality basis to calculate a Receiver Operating Characteristic.

It's not about back up here but verifying and confirming False Positives, developing a control environment that supports 'fail passive' functionality front to back, and it's quite simply an automation mistake from the ground up not to have this verification feedback system in place. 101 Basics on successful strategic design is to ensure redundancy and Receiver Operating Characteristics are tested. Let it be known and thanks to Boeing for proving redundancy is vital.

We can assume two pilots and one signal system for augmentation is a design flaw ~ there is a reason why all animals on this planet except for a strange looking microscopic copepod [LINK] have two eyes, that reason is stereo vision.

Next up is a cooked result from those who have narrow minds !!!

[2] Confirmation Biases is a close-minded tendency of managers to be more willing to accept new information that is aligned to their existing belief framing, especially when news supports their preconditioned opinions. A Confirmation Bias is a dangerous behavioral trait that is difficult to eradicate and it often leads to the creation of self-fulling prophesies in design mediums [LINK].

One of the best ways to disrupt this condition is to lift the diversity of ideas entering the design crews collective headspace, and that is usually an outcome of a culturally mixed business team which originates these ideas in the first place. Sounds simple enough, just make the team diverse, but divergent people come bundled with plenty of management challenges and conflict and, you will find that most companies are homogeneous with their hiring strategies from the outset. Why is that? ... HR recruitment also tends to suffer from this infliction.

Confirmation Bias breeds Confirmation Bias !!!

[3] Positional Debasing & Situational Awareness - A human failure that brings operators into a position where they queue or prioritize activities in an order that leads to unwanted outcomes. This MentorPilot Terrain Escape Maneuver demonstrates how pilots can find themselves in such a dilemma and what they can do to survive [LINK]. The best response to Situational Awareness threats is of course a planned response, which means you need to accept the potential for the condition from the outset and run simulations.

[4] Stability & Disturbance Rejections are perhaps the most simple of our automation errors to control but also potentially the most disastrous. In life, some processes must happen in a specific timing or sequence and are fragile to precision or calibration error, they can't be controlled with redundancy or a false positive test, and they will nearly always result in an adverse effect. You basically have one chance to get it right kind of scenario. These type of situations need careful detection, response planning, training and confirmation approval during treatment execution ~ they certainly make for a great news story no doubt and when all goes well, everyone is a hero. For me, I personally attempt to avoid optionless situations.

Given all the automation failures we have described here, I have left the worst for last and I personally see Race Conditions as the scariest, they are literally born in hell.

[5] Race Conditioning - A race condition is relatively easy to explain, hard to detect and extremely confusing for operators to treat. When a Race Condition unexpectedly presents itself in mission-critical systems, it is often lethal.

At it's simplest level it's a propagation delay or reversal, intermittent in nature, very hard to repeat, and sometimes it even requires operators to act in a counterintuitive manner or break the rules to learned response to survive [LINK]. Mostly, operators run short of the time needed to comprehend what is going on before they can act in a way that will save themselves. Over the years, this risk source has created some terrible catastrophes for airlines but then automation in complex systems can be loaded with Race Conditions, it's the king of all system bugs.

Out of our five killer threats, Race Conditions are probably an engineers worst nightmare. They are a ghost in the system if you prefer and they can remain latent in a service for years until a specific set of conditions or factors align themselves to create that perfect moment for failure.

That's it, the top automation killers. One thing I am considering to do is to write up different historical catastrophes across the aviation or manufacturing sector and then align these incidents to each of these control failures. Also, there are specific tests that can be used to demonstrate when one of these scenarios is in play, and an additional publication on that might be handy for control engineers, let's see.

Pages

What's Hot

Nassim Nicholas Taleb's blog, an inspiring read | Incerto

Tuesday, November 19, 2019

Automation and Risk

No comments:

Post a Comment