Systemic fragility. You won't see the risk until it's too late.

When the word ‘fragile’ comes to mind, the average person may think about things like glass vases, artwork, flowers, or maybe even emotions. But fragility takes on a whole new meaning when looked at on a broader scale—across natural, ecological, financial, economic, political and social systems. I would like to offer some context around the topic of systemic fragility—as it applies to the world around us. The odds are it will affect you or someone you know in your lifetime, so it’s good to be informed—and even better to be prepared.

How is a fragile system defined?
A fragile system can be defined as one that is exposed to the risk of catastrophic failure with an unacceptable probability. ‘Catastrophic failure,’ in this case, means the sudden and complete failure of the system. The second part of this definition–unacceptable probability–requires assessment of both the probability of a catastrophic failure and the total cost of such failure. Three questions must be asked to assess true fragility. Does the system have one or more modes of potential total failure? Is it possible for the failure to happen too fast for any remedial action to be of use? Is the probability of occurrence unacceptably high? If all three answers are yes, it’s precariously standing and ready to crumble with little heads-up.

What are some real-world examples of fragile systems?
Several catastrophic events that took place over the last decade alone may help demonstrate the risk of systemic fragility.

New Orleans Levees. Until Hurricane Katrina hit in 2005, the residents of New Orleans were unaware that the extensive levee system intended to protect their lives was fragile and flawed. Since the system was continually monitored by the US Army Corps of Engineers, people assumed it was safe. Over 50 failures of the levees and flood walls protecting New Orleans caused tens of billions of gallons of water to spill into the area and cause flooding in over 80% of homes and businesses.
BP Oil Spill. With the evolution of sophisticated technology, deep sea drillers have performed feats nothing short of stunning–reaching depths of tens of thousands of feet below sea level in search of oil and natural gas. But when tragedy hit in April of 2010, it became evident that added layers of complex technology do not automatically translate into added layers of reliability. The Gulf of Mexico oil spill underscores that technical progress can tend to hide system fragility.
Fukushima. The nuclear reactors in Fukushima were primarily installed during the 1970s, based on 1960s designs which were the most advanced for their day. Since they functioned without issues for 40 years, it was not unusual to presume ongoing reliability. Having the back-up diesel generators inundated following an earthquake-induced tsunami was never taken into account as a possible failure mode. The 2011 event was seen as a total disaster given the magnitude of systemic fragility exposed.
Sandy. Residents of New York, New Jersey and Connecticut previously had no reason to doubt transportation infrastructures. They had access to a complex system of bridges, tunnels, ferries, subways and trains. They were served by an assortment of power plants, and located near oil ports and refineries. However, the fragility of the complex system in the face of a serious power disruption was never thoroughly analyzed. Lack of transportation and fuel caused complete chaos during the weeks directly following the October, 2012 hurricane.
Mortgage Crisis. Catastrophic events mentioned above all had physical triggers–like a hurricane, an earthquake or equipment failure. But a system does not require a physical trigger to render it fragile. As I’ll expand upon in an upcoming post, the US housing market in 2006 had evolved into one of the most fragile economic systems of our time. It seemed to properly function until it seemed to suddenly fail. Preventing its total collapse has required the infusion of trillions of dollars, with no hard end-date in sight.

All five examples have the following in common. First, system users had yielded control and ownership of said systems to some central authority with no oversight. Secondly, the uneventful passage of time led, in every case, to a growing sense of false security. Thirdly, the growth of technical sophistication, efficiency and overall complexity led to a smug and misplaced sense of invulnerability.

Who’s working to identify fragile systems?
Engineers, for example, are taught about structural strength by studying the failure modes of materials. Take the load-bearing column or strut for instance. When students in a structural engineering lab apply a gradually increasing load to a column, they notice that the column slowly compresses and decreases in length under the load. Ultimately, the column fails in either one of two modes. If the column is short, with respect to its cross section area, and is made of brittle or fibrous material, like concrete or wood, it starts to crack or splinter. The damage gets worse over time as the load is increased. On the other hand, if the column is made of a highly elastic material, it suddenly buckles sideways like a spring. This buckling type of failure is very sudden–and catastrophic. A very interesting fact often overlooked is that the catastrophic mode usually accompanies the better (more elastic and less stratified) material.

How is failure probability assessed?
Assessing the failure probability of a real-life system is a highly technical process. First, it requires the system to be analyzed in terms of the interdependency of its components. This usually results in the representation of the system by an interconnected network which system analysts call a 'graph'. Secondly, each component of the graph is assigned a probability-of-failure value based on a thorough analysis and understanding of the behavior of said component. Thirdly, given the first two steps, the assessment of the failure probability of the whole system is then a straightforward, albeit computationally intensive, task. A package such as the SAS software for network algorithms (SAS/OR) can be used for such analysis.

Although, the highly technical methodology mentioned above is needed, and should be used, by system analysts and central planners, it is out of reach for the average person who still needs to assess the fragility of the systems surrounding his daily life. Does this mean that an average citizen should refrain from assessing the question of systemic fragility? To the contrary, it is every individual's personal responsibility to assess the fragility of the systems relied on for the proper functioning of their life. A trick often employed by system analysts, and can be used by all of us, is to assume a probability of failure of 1.0 (certain failure) to obtain an upper bound on the cost of failure. In other words, you should plan your life assuming that some of the systems you take for granted are guaranteed to fail.

What are the precautionary steps to prevent systemic risk?

Step one–assess the ‘total failure’ mode.
A system is considered totally failed when is ceases to perform the task for which it was intended. However, the total failure of a component does not necessarily lead to the total failure of the system itself. Continuing my earlier example, if a system of columns is used to support a water tank, then the failure of any one column is considered catastrophic only if it leads to the subsequent failure of neighboring columns in a chain reaction the ultimately results in the collapse of the water tank. Therefore, the study of the fragility of a system starts with the assessing the total failure modes of the system.
Step two–determine the ‘response time’ before a catastrophe occurs.
A fragile system does not betray its true vulnerability. It appears to be functioning normally until it ceases to function with little warning. A natural avalanche is such an example. The snow would appear stable and static until it starts to roll down the mountainside with vengeance. It is hard to notice the precarious nature of the slope by monitoring for signs of gradual failure. And once the failure starts, there is usually not enough time to evacuate. This type of failure is considered catastrophic for ski resorts. So ski slope operators rely on their understanding of the mechanics of snow slopes–and initiate controlled avalanches to eliminate that failure mode and subsequent risk. The failure of the engine on a single engine plane is not catastrophic if you only fly over farmland and beaches and can land at will. On the other hand, if you decide to fly over the Rockies with 100 miles between you and the nearest air strip, the outcome will be completely different. So the amount of time between the onset of failure and its consummation is what determines whether the failure is catastrophic.
Step three–evaluate acceptable probability. I discussed in an earlier post Subjective, risk is, that it’s important to identify the risk level appropriate for the system under study. Assessing the cost of the catastrophic failure of a system requires one to take into account all direct and indirect ramification of such failure. This is particularly important for systems whose failure can trigger a remote chain reaction in other seemingly unrelated systems. Only when the probability of catastrophic failure exceeds that acceptable level do we consider the system fragile.
Step four–monitor on a continuing basis. Identifying total failure modes, response times, probability of the occurrence and the level the system can tolerate should be an ongoing process. As we have shown above, man-made systems tend to grow in complexity over time. Yet, users of such systems often neglect to enforce ongoing re-evaluation.

Blogs

Blogs

Systemic fragility. You won't see the risk until it's too late.

About Author