In a previous note I
showed how to use order-of-magnitude thinking to quickly narrow down a highly
uncertain number to a workable range. I
used the rather artificial example of the number of pages in the Christian
Bible (equally applicable to Gone with the Wind or Harry Potter Meets Dracula). Here I show a real-life example from business
continuity planning.
The Challenge. How
in the world can a conscientious business continuity analyst possibly come up
with the dozens of estimates needed for a competent total risk assessment (TRA),
which is just the first step in a business continuity plan? This note shows
with concrete examples how order-of-magnitude thinking and interval estimates
can make fast work of this task, and still get a result that is both sensible
and defensible.
Taking inventory of the possible threats to business
continuity is one of the first steps in making a business continuity plan
(BCP). (I use the term “threat” to align with the FAIR taxonomy on risk,
although “hazard” would suit too.) Often
this starts with somebody’s long list of threats. These lists are commonly of the
one-size-fits-all sort, without regard to any particular circumstances, and so
comprise a vast variety of threats, many of which would not apply. The analyst is then charged to assess the
probability, or probable frequency of occurrence, and the probable loss to the
business if each threat were to materialize.
She may be on the defensive to explain why the risk of a typhoon can be
ignored. Finally the analyst is to somehow
combine the probability with the magnitude of loss to come up with a loss
expectancy estimate for each of these several dozen threats. And that’s just table stakes for a BCP.
I’ll demonstrate the method with four representative threats
for a hypothetical software development business located in the San Francisco
Bay Area:
- - Blizzard
- - Earthquake
- - Aviation accident, and
- - Pandemic.
They represent the general categories of meteorological,
geological, technological, medical threats.
I’ll give my personal (therefore subjective) estimates for probable
frequency of occurrence and also subjective estimates for the dollar impact on
this hypothetical business if each threat were to occur. In all cases I’ll give a rough range from low
to high. Finally I’ll use half-orders of
magnitude, that is, numbers like 1, 3, 10, 30, 100, etc., because I believe
this is close enough for a first cut.
The second cut comes, well, second.
Blizzard. Snow is very unlikely in the Bay Area except
at the highest elevations, but I realize that a snow event big enough to impact
the business could occur, so I’ll estimate the frequency to be between once in
30 years and once in 100 years. If such
an event were to occur, I feel it is highly likely it would not last more than
a day. Since this business is all
knowledge work, the business impact would mostly be loss of people
productivity. Suppose this business has
300 people and the average total compensation is $150K / year. I also assume that the value lost is
reasonably approximated by the replacement cost of the work. One day of lost productivity out of 250 working
days per year is roughly $200K ($150K x 300 / 250). (If your software engineers work 80 hours a
week, scale accordingly. Your mileage
may vary.) Even in this event probably most people would work at home, which
they often do anyway, so the loss may be more like half a day, or $100K. With these numbers in mind I estimate the
conditional impact between $30K and $300K.
In fact, a short search of historical records shows that snow had
accumulated on the streets of San Francisco in historical times, so a frequency
of once a century is reasonable.)
Earthquake. This is earthquake country, no doubt about
it. As a casual reader of the local
papers I am aware of geologists’ estimates that the next Big One will likely
occur within 30 years, so I’ll put the probable frequency in the range of 10 to
100 years. Notice that I am giving wide
latitude – half an order of magnitude – to the consensus number, in recognition
of the uncertainty. But if the Big One
were to occur, the business would effectively be shut down for some time. The question is, how long? The Loma Prieta quake in 1989 took most
people one to a few days to get back on their feet. That’s the low end. The high end may be 10 to 20 days, so again
using half-order-of-magnitude thinking I’ll estimate an impact of 1 to 30 days,
or $200K to $4M. This may seem like a
uselessly wide range, but stay tuned.
(Notice that I am ignoring a lot of detail here at the high
end. What about loss of revenue and
penalties for missed delivery dates?
What if the firm is driven into bankruptcy? We’ll get to that later.)
Aviation Accident. There are several airports in the area, both large
commercial and small general aviation.
An accident in the flight path could plausibly affect almost any
building in the Bay Area. If this were
to happen I judge the impact to be comparable to an earthquake – damage could
range from minimal to catastrophic. However
I can only think of a few cases in the United States in the past two or so
decades of an aviation accident impacting people on the ground, aside from
terrorism (which is a different threat).
If there have been say 10 such cases in 10 years, spread over what must
be more than a million buildings, the probable frequency is something like one
in one million per year. I could easily
be off by an order of magnitude either way, so I’ll put the frequency at 1 in
100,000 to 1 in 10 million.
Pandemic has
attracted much attention from BC planners in the last few years so it is worth
a look. Given the news coverage of Ebola,
I am going to estimate the probable frequency between one in three years to one
in 30 years. The impact on the business
would again be loss of productivity. In
the optimistic case only a few people, say 10, would be personally affected,
assuming public health resources are effectively mobilized and people cooperate
to prevent the spread. In the
pessimistic case 30% of the staff may not be able to work for several weeks,
say 30 days. I’ll assume unaffected
people can work from home if necessary with no productivity impact. Multiplying it out I get an impact range of roughly
$180K (10 people x $150K x 30/250) to $1.6M (30% x 300 *$150K x 30/250).
We’ve done all the spadework, so now we can put the results
together.
To compute annual loss expectancy I’ve simplistically
multiplied the lows by the lows and the highs by the highs. This could be overly pessimistic in the case
of the highs because it assumes the highest frequency occurs together with the
highest loss, which is probably not the case.
In fact, more-frequent losses tend to be the lower-magnitude ones. We could improve on this with a Monte Carlo
simulation but for a first cut the table is good enough.
Please note that the calculation of annual loss expectancy
is an honest multiplication. The
method avoids the fake math of “multiplying” a “low” frequency by a “high”
impact to get a “medium” loss expectancy, and the like.
Notice also that the annual loss expectancies fall naturally
into two categories, the ones that seem safe to ignore and the ones we need to
pay attention to. Also the threats in the two categories do seem to accord with
intuition.
Benefits. This analysis has done several
things for us:
- it focuses the BC planning where it really ought to go
- it shows where we may need to take a second cut
- it provides reasonable justification for what we decide to ignore
- it refines our intuition (and can alert us to blind spots), and
- it makes efficient use of our time.
Not a bad
deal.
No comments:
Post a Comment