If you are feeling out of sorts, a bit down and out, and want to take it all the way to full-blown depression, have I got a book recommendation for you: “Normal Accidents”, by Charles Perrow (1984). Perrow’s premise is that we have designed certain systems, nuclear reactors being his primary example, that are so complex that they are prone to the unanticipated interaction of multiple failures. Perrow’s definition, also called ‘system accidents’, consists of the system having two main characteristics: interactive complexity and tight coupling, and these systems can be either technological or organizational.
The first part of the definition, interactive complexity, is simply where the system has too many combinations and permutations of configurations to effectively model them all in advance. How a failure at one step can affect operations at another downstream can sometimes be modeled correctly, but introducing a third failure and predicting how all three will interact is often unknowable in practice.
Tight coupling means that A-causes-B-causes-C-causes-D is a given – the system is designed on purpose to automatically execute a certain chain of events once the initial conditions at step "A" are met, and another chain of events for a different set of initial conditions. There are no “loose” couplings where perhaps there is human intervention or where the system waits for confirmation from an independent source.
Normal accidents are at the extreme end of the risk we all manage for our organizations as a matter of course. Robert Kaplan (of Balanced Scorecard fame) defines three major categories of risks. Category I risks are those known-known risks for which we get no benefit, such as fraud, theft, or embezzlement, where the objective is to minimize or eliminate the risk. The perpetrators of fraud are typically engaged in behaviors that readily surface through pattern recognition analytics, and if they don’t change their pattern quickly enough, they can and will be caught. Fraud is currently the number one search topic on SAS’ websites, as organizations look for ways to minimize the cost of these Category I risks.
Category II risks are those where the risk-reward equation comes into play: yes, the risks are there, but so are the benefits, so are the rewards, if the risks can be properly managed. These are the business or operational risks that I have addressed in some of my previous posts, such as “How Much, How Soon and How Certain”.
Category II risks have three components: Level of risk, level of importance, and core competency. On the accompanying graphic, risk is represented by the vertical axis, importance by the darker colored bars (1, 3 and 4) and core competency by the solid fill (1, 2, 3 and 6), with 4 and 5 being two functions which the organization does not feel they have core competencies in.
Before we get back to what to do with this information, let’s dispense with the definition of Kaplan’s Category III risks, those unknown-unknowns such as earthquakes and revolutions for which, like Category I, there are no associated benefits, but are by their nature, unpredictable (I have dealt with these in earlier posts, such as “Black Swans” and “Plan V”, where scenario planning becomes the key risk mitigator).
The whole point of investing for a return on that investment is that there are high-risk opportunities out there which are both important and in which you have a core competency. While a few of the component combinations might be straight forward, most will require analysis and deliberation before coming to a conclusion regarding the best approach. For certain obvious areas where the risk is high and the competency low the risk mitigation strategy will likely involve outsourcing and/or hedging. Outsourcing a low importance function like office supplies can be a simple direct buyer/seller arrangement with a vendor (no company ever went bankrupt over yellow highlighters and copy machine paper), while the outsourcing of something more important, like IT, will require a more complex, on-going business arrangement.
In this example, function 5 might be a good candidate for outsourcing to a vendor – medium risk, no competency, and not terribly important. Function 4 is a more difficult situation – high risk, no competency and highly important. Outsourcing might be an initial good first step, with the intent to acquire core competency and bring it back in house at a later date, perhaps swapping resources with function 6, which is both less important but holds more risk. Functions 1, 2 and 6 will all generate interesting internal analysis and discussions regarding hedging strategies, perhaps warranted by the higher risks in 2 and 6, and the greater importance attached to #1.
One last lesson that can be learned from this deliberately ambiguous example is that, as a commercial enterprise, this organization is at great risk of becoming irrelevant in its chosen market. Such an analysis shows that the company has no function where it displays a core competency in an important, high-risk capability. It’s not tackling any hard problems and most likely not adding much value for its customers. No risk, no reward. Time to rethink strategy I would say.
As valuable as this approach is, it still can’t tell you if your business processes are prone to becoming a ‘normal accident” or not. Most organizations are sophisticated enough to conduct this type of point-by-point, function-by-function, capability-by-capability analysis and make decent business decisions. And most are in a position to model their basic processes for high probability single points of failure. Given that you’ve not made your processes more complex than absolutely necessary, Perrow offers only one other way out of normal accidents – loosely couple your processes.
Which brings me to The Man who Saved the World, Stanislav Petrov. In my humble opinion, the Nobel Peace Prize committee can save itself a lot of time by just naming Petrov as next year’s recipient. You can Google “The man who saved the world” to check out the details, but the short version is that on September 26, 1983, lieutenant colonel Petrov of the Soviet Air Defense Forces correctly identified a missile attack warning as a false alarm, a decision that may have prevented an erroneous retaliatory nuclear attack on the United States and its Western allies. Investigation of the satellite warning system later confirmed that the system had indeed malfunctioned and it was subsequently determined that the false alarms had been created by a rare alignment of sunlight on high-altitude clouds and the satellites' orbits, an error later corrected by cross-referencing a geostationary satellite. Talk about “system accidents” – that would have been a big one. Petrov was the loosely-couple component of this complex system, and he played his role perfectly.
As wonderful as it may seem to see your operational processes humming along smoothly all under computer control (see last week’s post on the “neural network” of the economy), proper governance and regular human monitoring of your business processes is the best way to avoid your own version of the “normal accident”.