A few years ago IBM asked me to help them calculate
"sigma levels" for some of their business processes. Sigma levels are
part of the "Six Sigma" approach to
monitoring and improving business quality developed by Motorola in 1986,
and since used by numerous consultants right across the world to package well
known techniques in order to con money out of gullible businesses.
The name, of course, was an important factor in helping the
Six Sigma doctrine to catch on. It is mysterious, with a hint of Greek, both of
which suggest powerful, but incomprehensible, maths, for which the help of
expensive consultants is obviously needed.
Sigma is the Greek
letter "s" which stands for the standard deviation - a statistical
measure for the variability of a group of numerical measurements. Sigma levels
are a way of relating the number of defects produced by a business process to
the variability of the output of the process. The details are irrelevant for my
present purposes except in so far as the relationship is complicated, involves
an arbitrary input, and in my view is meaningless. (If
you know about the statistics of the normal distribution and its relation to
the standard deviation you will probably be able to reconstruct part, but only
part, of the argument. You should also remember that it is very unlikely that
the output measurements will follow the normal distribution.)
The relationship between sigma levels and defect rates can
be expressed as a mathematical formula which gives just one sigma level for
each percent defective, and vice versa. Some examples are given in the table
below which is based on the Wikipedia article on 25 April 2015 - where you will
be able to find an explanation of the rationale.
(An Excel formula for converting percent defective to sigma
levels is =NORMSINV(100%-pdef)+1.5, and for converting sigma levels to percent
defective is =1-NORMDIST(siglev-1.5,0,1,TRUE) where pdef is the percent
defective and siglev is the sigma level. The arbitrary input is the number 1.5
in these formulae. So, for example, if you want to know the sigma level
corresponding to a percent defective of 5%, simply replace pdef with 5% and put
the whole of the first formula including the = sign into a cell in Excel. Excel
will probably format the answer as a percentage, so you need to reformat it as an ordinary number. The sigma level you should get is 3.14.)
Sigma level
|
Percent
defective
|
Defectives per million opportunities
|
1
|
69.1462461274%
|
691462.4613
|
2
|
30.8537538726%
|
308537.5387
|
3
|
6.6807201269%
|
66807.20127
|
4
|
0.6209665326%
|
6209.665326
|
5
|
0.0232629079%
|
232.629079
|
6
|
0.0003397673%
|
3.397673134
|
7
|
0.0000018990%
|
0.018989562
|
2.781552
|
10%
|
100000
|
3.826348
|
1%
|
10000
|
4.590232
|
0.10%
|
1000
|
5.219016
|
0.01%
|
100
|
5.764891
|
0.0010000000%
|
10
|
6.253424
|
0.0001000000%
|
1
|
6.699338
|
0.0000100000%
|
0.1
|
But what, you may wonder, is the point in all this? In
mathematics, you normally start with something that is difficult to understand,
and then try to find something equivalent which is easier to understand. For
example, if we apply Newton's law of gravity to the problem of calculating how
far (in meters, ignoring the effect of air resistance) a stone will fall in ten
seconds, we get the expression:
Io5
9.8dt
(I represents the mathematical symbol for an integral that I can't get into Blogger.)
If you know the appropriate mathematics, you can easily work
out that this is equal to 122.5. The original expression is just a complicated
way of saying 122.5.
The curious thing about sigma levels is that we are doing just the opposite: going
from something that is easy to understand (percent defective) to something that
is difficult to understand (sigma levels), and arguably makes little sense
anyway.
In defence of sigma levels you might say that defect levels are typically very small, and it is easy to get confused about very small numbers. The numbers 0.0001% and 0.001% may look similar, but one is ten times as big as the other: if the defect in question leads to the death of a patient, for example, the second figure implies ten times as many deaths as the first. Which does matter. But the obvious way round this is to use something like the defectives per million opportunities (DPMO) as in the above table - the comparison then is between 1 defective and 10 defectives. In sigma levels the comparison is between 6.25 and 5.76 - but there is no easy interpretation of this except that the first number is larger than the second implying that first represents a greater unlikelihood than the other. There is no way of seeing that deaths are ten times as likely in the second scenario which the DPMO figures make very clear.
In defence of sigma levels you might say that defect levels are typically very small, and it is easy to get confused about very small numbers. The numbers 0.0001% and 0.001% may look similar, but one is ten times as big as the other: if the defect in question leads to the death of a patient, for example, the second figure implies ten times as many deaths as the first. Which does matter. But the obvious way round this is to use something like the defectives per million opportunities (DPMO) as in the above table - the comparison then is between 1 defective and 10 defectives. In sigma levels the comparison is between 6.25 and 5.76 - but there is no easy interpretation of this except that the first number is larger than the second implying that first represents a greater unlikelihood than the other. There is no way of seeing that deaths are ten times as likely in the second scenario which the DPMO figures make very clear.
So why sigma levels? The charitable explanation is that it's the
legacy of many years of calculating probabilities by working with sigmas (standard
deviations) so that the two concepts become inseparable. Except of course, that
for non-statisticians they aren't connected at all: one is obviously meaningful
and the other is gibberish.
The less charitable explanation is that it's a plot to
mystify the uninitiated and keep them dependent on expensive experts.
Is it stupidity or a deliberate plot? Cock-up or conspiracy?
In general I think I favour the cock-up theory, partly because it isn't only
the peddlars of the Six Sigma doctrine who are wedded to sigma mystification. The
traditional way of expressing quality levels is the capability index cpk
- this is another convoluted way of converting something which is obvious into
something which is far from obvious. The rot had set in long before Six Sigma.
And it's not just quality control. When the Higgs Boson
particle was finally detected by physics researchers at CERN, the announcement
was accompanied by a sigma level to express their degree of confidence that the
alternative hypothesis that the results were purely a matter of chance could be
ruled out:
"...with a statistical significance of five standard
deviations (5 sigma) above background expectations. The probability
of the background alone fluctuating up by this amount or more is about one in
three million" (from the CERN website in April 2015. The sigma level here does not involve the arbitrary input
of 1.5 in the Excel formulae above: this should be replaced by 0 to get the
CERN results.)
Why bother with the sigma level? The one in three million
figure surely expresses it far more simply and far more clearly.
No comments:
Post a Comment