Risky Root Causes

I spoke recently at a workshop organized by the Alan Turing Institute in London to identify areas related to cyber security in which major research is needed. Though I focused on security analytics, I also talked about the need to develop more effective models for understanding and managing risk, citing the work that we are doing in the EU-funded SPARKS project , especially our Threat and Risk Assessment Methodology document (D2.2) and our work in STAMP/STPA methodologies, as well as recent developments in operational risk management such as support for loss-event models in RSA Archer 6.0. On the flight back to Switzerland, as I reviewed what we had accomplished in the workshop, I saw a general consensus regarding the need for a research focus on security analytics. This included areas such as simplified ingestion of widely disparate data stores, on-going development of algorithms for pattern discovery and anomaly detection, and improvements in visualization, assembling context, evaluation of criticality and other areas that contribute to an effective human augmentation model. But there was far less recognition of the equally large need for a research focus on risk.

This struck me even at the very beginning of the workshop, in a presentation about cyber security in the pharmaceutical industry.  In that presentation risk management, in particular in terms of an asset-value methodology that reflects a core concern in that industry with theft of intellectual property, was taken as a mature, established body of knowledge. This evaluation of risk was shown in an example of a probability/impact graph used to prioritize which information assets to focus on. But it struck me immediately that focusing on asset-value ignored other aspects of risk that should also be considered, in particular the disruption or destruction of production facilities, the integrity of operational data, the availability of essential services and other concerns whose impact is related not to the value of an information asset but to the magnitude of a loss, especially of operational capabilities. In the SPARKS project, we have seen that risk related to disruption and destruction of capability – not only in Smart Grid, but in energy in general, as well as in manufacturing, telecommunication, financial services, e-government and many other areas – has to be a major concern. The recent DDOS attack on the thirteen internet root name servers that slowed down network traffic world-wide is just the latest example of such attacks that have included oil production in the Middle East, electric power in Pakistan, and steel manufacturing in Germany.

furnace

(Image from a YouTube video posted by Johnny Adams on the report of a German steel mill cyberattack)

Later in the day, another speaker touched briefly on risk, calling out the importance of identifying root causes.  But he spoke only about technological issues such as vulnerabilities. Even if one looks for root causes in other areas, such as analyzing attacker strategy and motivation in terms of the value they get from an attack, the difficulty of attack, the risk of discovery, as well as social and psychological drivers, such an analysis tends to suffer from what Peter Senge identified twenty-five years ago as the problem of looking at events and actions linearly: “Reality is made up of circles but we see straight lines….Our habitual ways of seeing the world produce fragmented views and counterproductive actions.” (The Fifth Discipline, p. 73)

circle3

(Senge, The Fifth Discipline, p.393)

It is one of the great strengths of loss-event methodologies, such as the STAMP/STPA methodology championed by Dr. Nancy Leveson of MIT that it looks for the broadest possible set of causes for risk in a particular loss-event scenario. In Engineering a Safer World, Dr. Leveson discusses a pharmaceutical example at length (drawing on an analysis by Matthieu Couturier): the financial and reputational loss that Merck incurred with the introduction and withdrawal of Vioxx. (p. 239) The recall of Vioxx was not the result of technological issues or attacker manipulation of information or processes. Rather, the analysis showed the interrelationship between drug safety control structures, system safety requirements and constraints, the events that occurred and the system dynamics that together resulted in the suppression of drug trial data, misleading marketing information and vilification attacks on individuals that eventually caused the FDA to require recall of the drug.

There is still significant work to be done in developing more effective risk methodologies.  It was very important to have the opportunity to speak about this at the Turing Institute. My SPARKS colleagues and I are looking forward to continuing the conversation at the MIT STAMP Workshop in March 2016.