Thursday, October 15, 2020

Covid is not my problem. Intuitions about small probabilities are unhelpful.

When I last thought about this, the coronavirus infection rate where I live was about 1 in 2000. This is a very low probability, and even if I did get infected the chances of my getting seriously ill are also fairly low. Unfortunately  our intuitions have difficulty with very small probabilities - which means we can easily jump to disastrously wrong conclusions. There are two basic problems.

The first is a very reasonable tendency to assume that very unlikely events just won't happen. There are obvious evolutionary advantages in clarifying one's view of the world by ignoring very unlikely events, so the propensity to think like this may well be hard wired into our minds. So I'm obviously not infected so there's no danger of my passing it on. And if everyone thinks like this ....

The second problem is that our intuitions find it hard to acknowledge that lots of unlikely events can combine to create events that are much more likely. This is hardly surprising if we are wired to ignore events which are very unlikely.

Lets say I have daily meetings in groups of 6. So I meet 5 other people every day. In the first 8 days I'll meet 40 people. The chance that one of them has covid is roughly 40/2000 or 2% - so, from a pessimistic perspective, this is my chance of being infected. But 2% is not too bad. Not worth worrying about!

Now let's imagine everyone's doing the same thing. After the first 8 days, everyone's chance of being infected is now 2%. (To simplify things, we'll assume the original 1 in 2000 infected are either recovered, safely tucked away on a ventilator, or dead) Over the next 8 days everyone will meet another 40 people, so the chance of one of these being infected is about 80% (40 x 2%) - which we'll take to be the chance of my getting infected. (Confession: I've simplified the probability calculations in a way which makes sense with 1/2000 but with 2% is a bit rough - the right answer should be 55%, but this is still more than half.)

So in 16 days we've gone from 1 in 2000 being infected to most people being infected.

If this scenario plays out almost everyone will be infected after a few weeks. Sooner if we meet more people more often, but later if some of the people I meet are from the same household, or if my chances of picking it up from a single infected person are less than 100% - both of which are likely in practice. 

Friday, May 08, 2020

Suggestions for making research as trustworthy, intelligible and quickly available as possible

There's a lot of research papers out there - some good, some bad, some intelligible, some unintelligible, some disseminated quickly, some slowly. Now, with covid-19, it is more important than ever that we get to know whether a research paper should be trusted, and as many as possible of the relevant people should be able to understand it as quickly as possible, and that the research methods employed are not unduly restrictive The phrases in italics refer to key problems with the system for creating and disseminating knowledge. The purpose of this note is to explain how a few publications of mine are relevant to tackling these problems.

Just after deciding to write this note, I received a "nature briefing" email from the journal Nature summarising some of their latest highlights. One thing that caught my eye was a paper entitled "Suppression of COVID-19 outbreak in the municipality of Vo’, Italy" (E. Lavezzo et al. Preprint at medRxiv According to the Nature summary: "On 21 February, the town of Vo’ reported Italy’s first COVID-19 death, leading authorities to ban movement in the town and end public services and commercial activities there for two weeks. Andrea Crisanti at Imperial College London and his colleagues swabbed almost every resident of Vo’ for viral RNA at the beginning and end of the lockdown. ... The team found that some 43% of the people infected with SARS-CoV-2 in the town reported no fever or other symptoms. The researchers observed no statistically significant difference in potential infectiousness between those who reported symptoms and those who did not."

The paper in question was posted on which describes itself as a preprint server for health sciences. There is a warning on the top of each page that the paper is "not certified by peer review". This seems to imply that once it has been "certified" - by a couple of "peers" chosen by the editor of the journal - then its contents must be right, trustworthy or whatever. This assumption is implicit in the journalistic practice of giving the name of the journal a research paper appears in with the implied "must the right, it's published in this prestigious journal". The paper was posted on 18 April 2020. I am writing this two weeks later and there has been no update: surely given the urgency of the problem the paper should have been peer reviewed already, and a link to the update posted so that readers are aware of the latest version?

In practice, peer review and the assumptions behind it, are far too crude. At the frontiers of knowledge science is an uncertain business, and the idea that the standards required of new research are sufficiently clear cut for a certificate, issued by an editor aided by a couple of anonymous peer reviewers,  which will "certify" the validity of the research is at best wildly unrealistic. In the real world both trustworthy and untrustworthy research is published both in peer reviewed journals and un-peer-reviewed on preprint servers and other websites. And the timing issue is important - peer review usually takes months and may take years (my record from submission to publication is over four years).

A better system is desperately needed which would produce a critique, and suggest improvements, from as many relevant perspectives as possible, as quickly and transparently as possible. The idea that peers are the only relevant judges of quality is very strange. Ideally we would want review by superiors, but as these obviously don't exist at the frontiers of knowledge, surely the views of "peers" should be supplemented by "non-peers" like experts in neighbouring or other relevant disciplines (like statistics) and members of the potential audience?  The conventional relaxed timetable for the reviewing process is also odd and unnecessary. This paper should surely be vetted as quickly as possible so that people know how much trust they should place in its conclusions. A few years ago I made some suggestions for tackling these issues - see  The journal of everything (Times Higher Education, 2010; there is a similar article here) and Journals, repositories, peer review, non-peer review, and the future of scholarly communication (, 2013). (Confession: the paper was not peer-reviewed, but an earlier, less interesting, paper was peer reviewed;  the Times Higher Education article was reviewed by the editors.)

It is also important that a research papers should be as clear and easy to understand as possible, particularly when, as in the case of the covid-19 paper, it is relevant to a range of experts with different specialisms - e.g. public-health, immunologists, supply chain experts, ethicists, anthropologists, historians, as well as medics according to this article. I read the abstract to see how much sense it made to me as someone relative ignorant of all these areas. There were some biological and epidemiological jargon which was probably inevitable given the topic. I did not know what a "serial interval" is, but a quick check on wikipedia solved this problem. There were also some genetic terms and phrases I did not understand, but their role seemed sufficiently clear from the context. As far as I could judge - and as a complete novice this is not very far - these aspects of the abstract were as simple as is consistent with conveying the meaning.

However I think the statistical approach could be improved from the point of view of providing an analysis which is as clear and useful as possible for as wide an audience as possible. The authors "found no statistically significant difference in the viral load ... of symptomatic versus asymptomatic infections (p-values 0.6 and 0.2 for E and RdRp genes, respectively, Exact Wilcoxon-Mann-Whitney test)." This follows a conventional statistical approach, but it is one that is deeply flawed. We aren't told a vital bit of information - how big is the difference? Is it big enough to matter? The fact that is statistically significant seems to imply it matters but this is not what this actually means. And without a training in statistics the p values are meaningless.

In my view a far better way to analyse and present these results would be in terms of confidence levels. So we might conclude that for the E gene, if we extrapolate the results beyond the sample studied, we can be 80% confident that symptomatic infections have a higher viral load than asymptomatic ones. Or perhaps 60% confident that there is a big difference (more than a specified threshold). I made these figures up because I haven't got the data, but the principle of giving a straightforward confidence level should be clear. Then it should be obvious that 80% confidence means there is a 20% confidence it's not true. There are more details of this approach in Simple methods for estimating confidence levels, or tentative probabilities, for hypotheses instead of p values.

This illustrates another theme that I think is very important: keeping academic ideas as simple as possible. There is, I think, a tendency for academics to keep things complicated to prove how clever, and essential, they are. The danger then is that each specialism retreats into its own silo making communication with other specialisms and the wider public more and more difficult. I think we should resist this tendency and strive to make things as simple as possible. But not, of course, as I think Einstein said, simpler. I have written a few articles on this theme, for example: I'll make it simple (Times Higher Education, 2002, also here), Maths should not be hard: the case for making academic knowledge more palatable (Higher Education Review, 2002) and Simplifying academic knowledge to make cognition more efficient: opportunities, benefits and barriers.

Finally there is the problem that some scientists only accept evidence from statistical studies based on largish samples. Anything less is mere anecdote and not to be taken seriously. But even a single anecdote can prove that something is possible - perhaps that covid can be caught twice. Anecdotes are potentially important, as indeed is fiction (think of Einstein's thought experiments, or Schrodinger's cat), as I argue in Are ‘Qualitative’ and ‘Quantitative’ Useful Terms for Describing Research? and Anecdote, fiction and statistics. There is also an article in the Times Higher Education, which cites several other academic articles, and comes to similar conclusions.

Tuesday, January 15, 2019

We should make academic knowledge easier

If academic knowledge were simpler to understand and use, more people would understand more, misleading misunderstandings should be less prevalent, the education industry would be cheaper and more efficient, and humanity would make faster and better progress. I am convinced this is an idea with enormous potential, but it does not seem to be on anyone's agenda, and there are very strong vested interests opposing it.

Human civilization depends on knowledge. Lots of it, ranging from how to use Pythagoras's theorem to produce a right angle to the science behind mobile phones and GPS, from the idea that germs cause disease to the science behind modern medicine, from stories and ideas about to produce them to the theories behind voice to text software. There are lots of different types of knowledge, and the boundaries of what counts as knowledge are a bit fuzzy. I'm talking here about knowledge in people's heads, not the knowledge in databases and AI algorithms - although the implications of these for what people need to store in their heads is another, vital, and fascinating, story.

Some knowledge is easy and we pick it up naturally as we grow up. But some of it is complicated - it's difficult to learn and use: this is the rather fuzzily defined "academic" knowledge that I'm concerned with here. Two massive, interlinked, industries have evolved to cope with these difficulties: education which disseminates the knowledge, and what, in the absence of a suitable word, I'll call the knowledge production industry or KPI. (In universities knowledge production is called research, but from my point of view this term is too restrictive because it seems to imply the search for the "truth" by modern academics, whereas I need a term which also covers the work of Pythagoras and people trying to devise ways of making driverless cars.)

The KPI - scientists, researchers, and innovators both now and throughout history - make discoveries or invent theories or better ways of dealing with the world, and the results of their labours are then passed on to a wider audience by the education industry - schools, colleges, universities, textbooks, etc.  The education system gets a lot of analysis and criticism: better ways of teaching and learning are proposed and tested, and inequalities of access and the ineffectiveness of a lot of the education system are bemoaned ad nauseam, and so on. But the knowledge itself is seen as given, fixed, handed down from the experts, and the job of education is to pass it on to students and the wider public in the most efficient possible way.

My suggestion here is that there are often good reasons to change the knowledge itself to make it simpler or  more appropriate for the audience. An important  KPI (key performance indicator) for the KPI (knowledge production industry) is the simplicity of the product.

This idea stems from a number of sources, some of which I'll come on to in a minute, but first a little thought experiment. Imagine that a bit of knowledge could be made simpler by a factor of 50%, so that, for example, the time needed to learn it, or to use it, was halved, or that it led to about 50% fewer errors and misconceptions in implementation. Imagine this applies to all knowledge taught in universities and similar institutions. Then students would learn about 50% more, or they would understand about 50% better, or have about 50% of their time free to do something else. Leading edge researchers would arrive at the leading edge in the half the time they take at the moment, giving them more time to advance their subject. If such simplifications could be made across the whole spectrum of knowledge, this would represent an enormous step forward for humanity.

You might think that the innovators and researchers of the KPI would have honed their wares carefully to make them as simple as possible, so a 50% improvement is simply impossible. But you'd be wrong. Very wrong. Except at the leading edge there is absolutely no tradition in the academic world of trying to make things simpler. Simplicity is for simple people, not academics who are clever people. I've had a paper rejected by an academic journal because it was too simple: it needed to be more complicated to appear more profound. Teaching and learning methods are tweaked to make them easier for learners, but the knowledge itself is considered sacrosanct: the experts have decreed how it is, and that's it.

There are exceptions: areas where simplicity is a prized quality of knowledge. One interesting example is the leading edge of one of the most complicated areas of human knowledge: the physics of things like quantum mechanics and cosmology. I've just been watching an interview of the physicist Roger Penrose who was recounting his difficulties with lectures at Cambridge University: they were too complicated to understand so he had to invent simpler ways of looking at the issues. Einstein is supposed to have said that everything should be made as simple as possible, but not simpler. I also came across similar sentiments by two other Nobel prize winners, Paul Dirac and Murray Gell-Mann, and yet another Nobel winner, Richard Feynman, invented a type of diagram -(subsequently called Feynman Diagrams) which gives "a simple visualization of what would otherwise be an arcane and abstract formula" (Wikipedia). Where things are really difficult, simplicity is essential. But behind the pioneers of the discipline, the normal practice is to accept what the gurus have produced.

The history of science and the growth of knowledge in general are punctuated by occasional revolutions that often lead to far simpler ways of looking at things. The invention of the alphabet made record keeping far easier and more flexible so that all sorts of stories could have a wider audience, and the replacement of Roman numerals by the current system (2019 instead of MMIX) did a similar job for arithmetic. The ideas introduced by Galileo and Newton provided a way of understanding and predicting how things move which can be summarised in a few simple equations and covers everything, both on earth and in the heavens. This would probably not have been considered simple by contemporaries of Galileo and Newton, or many present day students, but the equations are staggeringly simple when you consider what they achieved. Similarly, Charles Darwin's theory of evolution by natural selection provides a ridiculously simple explanation of the evolution of life on earth.

But what about the detailed, mundane stuff that students spend their time learning? Quadratic equations and statistics, chemistry and the methodology of qualitative research, medicine and epistemology? Are there opportunities for simplification here?

My contention is that there are, and the fact that are almost never taken is a massive lost opportunity. There are two important differences between the situations of the leaders and followers in a discipline. The first is that the leaders will have a really good understanding of all the stuff leading up to their innovation - the mathematics, other results in the field, the meaning of the jargon, and so on. The followers are, inevitably,  not going to have such a thorough understanding of the background (they've got better things to do with their time). The second is that the motivations are likely to be different. The followers will want to fit new ideas into the mosaic of other things they know and the current concerns of their lives with as little effort as possible; the leaders, on the other hand, are likely to have a burning desire to progress their discipline in the direction they want to take it. These two factors mean that the best perspective for the followers may not be the same as for the leaders.

But is this possible? Are there alternative, simpler, or more appropriate, perspectives in many branches of knowledge? Well, yes, there are: difficult ideas have often spawned popular versions, or, as cynics would say, they have been dumbed down for the masses. But pop science is not usually serious science: if you want to use the ideas for real, or make breakthroughs yourself, the dumbed down, popular version will not do: you need the original ideas produced by the leaders, the experts themselves.

This is not what I am talking about here. What I am suggesting is possibility of producing a simpler more appropriate version for the followers, but one that is as useful and powerful as the original expertise produced by the experts. Or, possibly, more useful and more powerful.

I used to be a teacher in a university, several colleges and on short courses for business. As a teacher you try to explain your material as clearly as possible. But often, perhaps usually, I found myself thinking of alternative ideas which I thought were more appropriate. And I've been doing this for 40 years, publishing the occasional article on what I came up with (the first such article was published in 1978: there is a list of a few more here).

The area I thought about in most detail was statistics. There are three key innovations I would like to see promoted here. The first is computer simulation methods: instead of working out some complicated maths for lots of specific situations, you just do some simulation experiments on a computer so that you can, literally, see the answer and where it comes from (e.g. Bootstrap resampling ...). The second is jargon, which needs changing where it is misleading. The worst offender is the word "significant". This has a statistical meaning, and a meaning in everyday language which is completely different. This leads to massive, and entirely predictable, and avoidable, problems. The third is to focus on ideas that are helpful as opposed to ideas which fit statistical orthodoxy - see for example Simple methods for estimating confidence levels ... .

Other areas I pondered include research methods as taught in universities (a lot of the jargon is best ignored: see Brief notes on research methods and How to make research useful and trustworthy), decision analysis (The Pros and Cons of Using Pros and Cons for Multi-Criteria Evaluation and Decision Making), statistical quality control, mathematical notation in general, Bayes' theorem in statistics (see pages 18-22 of this article), and the maths of constant rates of growth or decline (traditionally dealt with by exponential functions, calculus and logarithms but this is quite unnecessary).

Did I act on these ideas and teach the simpler versions that I felt were more appropriate? Sometimes I did, but usually I didn't. I was paid to teach the standard story, and didn't feel I could go out on a limb and teach my own version - which was usually untested and might not work. That's what the students and the organisations I was working for expected. And, besides, the system has an inertia that makes it difficult to change just one bit. The term "significant", for example, might be, in my view and the view of many others, awful jargon describing an awful concept which promotes confusion and discourages useful analysis, but it is very widely used in the research literature so people do need to know what it means.

There were exceptions where I did follow my better judgment. Computer simulation methods in statistics are widely used (but usually only where other approaches fail) so, on some courses, I did use these. And sometimes, as with research methods, the problem was that a lot of the standard material was just a waste of time and was best ignored so that we could focus on things that mattered. But even here, by not explaining the t-test, or emphasising the distinction between qualitative and quantitative methods, I was failing to meet the expectations of many colleagues and students.

But surely, you're probably still thinking, if there really is such an enormous untapped potential, people would be tapping into it already? Part of the reason why they aren't, or are to only a very limited extent, is that the forces which act against change are very powerful and go very deep. I was so deeply enmeshed in the assumptions of academic statistics that an obvious alternative to the concept of significance in statistics (as explained in Simple methods for estimating confidence levels ...) did not occur to me for 30 years after publishing an article critical of the concept, and the statistics journals I submitted my idea to rejected it often with a comment along the lines of "if this was a good idea the gurus of statistics would have thought of it".

As well as the inevitable conservatism of any cognitive framework there are three factors which are peculiar to the knowledge production industry: the peer review system, the lack of a market or responsive feedback system for evaluating ideas and theories, and the desire of the education system to preserve "standards" by keeping knowledge hard. I'll explain the problems with each of these in the next three paragraphs.

The peer review system is the way new academic knowledge is vetted and certified as credible. Articles are submitted to a journal in the appropriate field; the editor then sends it out to two or three peer (usually anonymous) reviewers - often people who have published in the same journal - who make suggestions for improving the article and advise the editor on whether it should be published. The fact that an article has been published in a peer reviewed journal is then taken as evidence of its credibility. This system has come in for a lot of criticism recently (e.g. in Nature): mistakes and inconsistencies are common, but one key issue is that the reviewers are peers: they are in the same discipline and likely to be subject to the same biases and preconceptions. Peer reviewers would seem unlikely to be sympathetic to the idea of fundamentally simplifying a discipline. I think some non-peer review would be a good idea as advocated in this article.

Mobile phones and word processors are relatively easy to use. You don't need a degree or a lot of training to use them. This is because if people couldn't use them, they wouldn't buy them, so manufacturers make sure their products are user-friendly. There are lots of efficient mechanisms (purchase decisions, reviews on the web, etc) for providing manufacturers with feedback to make sure their products are easy to use. The academic knowledge ecosystem lacks most of these feedback mechanisms. If some knowledge is a difficult to master, you need to enrol on a course, or try harder, or give up and accept you're too lazy or not clever enough. What does not happen is the knowledge producer getting a message along lines: "this is too complicated, please simplify or think again."

This is reinforced by the education system which has a strong vested interest in keeping things hard. There is an argument that the purpose of the education system isn't so much to learn things that are useful (everyone knows that a lot of what is learned is immediately forgotten and never used), but to "signal" to potential employers that you are an intelligent and hard-working person (an idea popularised by the book reviewed here). From this perspective difficult knowledge is likely to be better for differentiating the worthy from the less worthy students. The fact that the difficult knowledge may not be much use is beside the point, which is that the less diligent and intelligent should fail to understand it. And of course difficult knowledge enhances the status of teachers and means that they are more obviously necessary than they would be if knowledge were easier. Universities would lose most of their business if knowledge were easy to master: teaching and assessment would be much less necessary.

Knowledge, of course, is not just to make the economy more efficient; it is part of our cultural heritage and what makes life worth living. Arguably, we have a duty to pass on the work of the masters to future generations? OK, some unnecessarily complicated theories may have historical or aesthetic value, but, in general, if there is a simpler, more elegant version, isn't this preferable?

So ... I would like to propose that simplifying knowledge, or making it more appropriate for its purpose, is an idea that should be taken seriously. Otherwise knowledge will evolve by narrowly focused experts adding bits on and making it more and more complex until nobody really understands what it all means, and progress will eventually grind to a halt in an endless sea of technicalities.

This requires a fundamentally new mindset. First we need some serious creative effort devising new ways of looking at things, and then empirical research on what people find useful, but also simple and appealing. Perhaps knowledge should be viewed as art with aesthetic criteria taken seriously? Whatever we are trying to do - discover a theory of everything, cure diseases, prevent suffering or make people happier - simplicity is an important criterion for evaluating the knowledge that will best assist us.

Then we should make faster progress, more people will get to understand better, less time will be wasted on unnecessary complexities, and we should make fewer silly mistakes.

This is just a summary. There is more on this theme in the articles linked to this page

Wednesday, April 20, 2016

Why is the statistical package SPSS so unhelpful?

I've just run a statistical test on SPSS to see if there is a difference between articles in the Guardian and Telegraph in terms of Characteristic X (it doesn't matter what X is for my purposes here). The results are pasted below. The presence of X is coded as 1, and its absence by 0.

The first table shows that a higher proportion of Guardian articles (33.5%) than Telegraph articles (24.1%) had X. The second table addresses the issue of statistical significance: can we be sure that this is not a chance effect that would be unlikely to recur in another sample of articles?

Paper * Code Crosstabulation

% within Paper
% within Paper
% within Paper

Chi-Square Tests

Asymp. Sig. (2-sided)
Exact Sig. (2-sided)
Exact Sig. (1-sided)
Pearson Chi-Square
Continuity Correctionb

Likelihood Ratio
Fisher's Exact Test

N of Valid Cases

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 24.21.

I decided I would like a two sided significance level, and looked at the second table to find it. Unfortunately there are no fewer than four different answers (0.128, 0.168, 0.122 and 0.145)! Which to choose?

Further study of the table only deepened my confusion. The heading is Chi-Square tests but two of the columns are headed Exact Sig. My understanding is that the chi-square test uses the chi-square distribution which is a well known way of working out approximate probabilities. The exact test works out the equivalent probabilities directly without using the chi-square distribution, so the entries in the Exact test columns are not chi-square results despite the table heading. One of the rows is headed Fisher Exact Test and another Pearson Chi-Square which seems to confirm this. But what can we make of the top right figure (0.083) which is Chi-square according to the table heading, Pearson Chi-Square according to the row heading, and Exact Sig according to the column heading? Help!

OK, I know I should have consulted the Help (it doesn't work on my computer so I can't), or a book on using SPSS, or gone on a course and provided employment for an expert. But I don't think this should be necessary. SPSS should produce clear  tablese A with a little explanation of what the numbers mean. In the present case, as exact probabilities can be computed surely this is all that's needed. With a sensible heading for the table, and a little note on what the probabilities represent.

SPSS should produce clear, consistent tables which present only the relevant information with an explanation in, as far as possible, non-technical language.

But then people might understand the output and the market for courses and experts would be much diminished.

Thursday, May 28, 2015

Six sigma and the Higgs Boson: a convoluted way of expressing unlikeliness

A few years ago IBM asked me to help them calculate "sigma levels" for some of their business processes. Sigma levels are part of the "Six Sigma" approach to  monitoring and improving business quality developed by Motorola in 1986, and since used by numerous consultants right across the world to package well known techniques in order to con money out of gullible businesses.

The name, of course, was an important factor in helping the Six Sigma doctrine to catch on. It is mysterious, with a hint of Greek, both of which suggest powerful, but incomprehensible, maths, for which the help of expensive consultants is obviously needed.

Sigma is the Greek letter "s" which stands for the standard deviation - a statistical measure for the variability of a group of numerical measurements. Sigma levels are a way of relating the number of defects produced by a business process to the variability of the output of the process. The details are irrelevant for my present purposes except in so far as the relationship is complicated, involves an arbitrary input, and in my view is meaningless. (If you know about the statistics of the normal distribution and its relation to the standard deviation you will probably be able to reconstruct part, but only part, of the argument. You should also remember that it is very unlikely that the output measurements will follow the normal distribution.)

The relationship between sigma levels and defect rates can be expressed as a mathematical formula which gives just one sigma level for each percent defective, and vice versa. Some examples are given in the table below which is based on the Wikipedia article on 25 April 2015 - where you will be able to find an explanation of the rationale.

(An Excel formula for converting percent defective to sigma levels is =NORMSINV(100%-pdef)+1.5, and for converting sigma levels to percent defective is =1-NORMDIST(siglev-1.5,0,1,TRUE) where pdef is the percent defective and siglev is the sigma level. The arbitrary input is the number 1.5 in these formulae. So, for example, if you want to know the sigma level corresponding to a percent defective of 5%, simply replace pdef with 5% and put the whole of the first formula including the = sign into a cell in Excel. Excel will probably format the answer as a percentage, so you need to reformat it as an ordinary number. The sigma level you should get is 3.14.)

Sigma level
Percent defective
Defectives per million opportunities

But what, you may wonder, is the point in all this? In mathematics, you normally start with something that is difficult to understand, and then try to find something equivalent which is easier to understand. For example, if we apply Newton's law of gravity to the problem of calculating how far (in meters, ignoring the effect of air resistance) a stone will fall in ten seconds, we get the expression:
Io5 9.8dt
(represents the mathematical symbol for an integral that I can't get into Blogger.)

If you know the appropriate mathematics, you can easily work out that this is equal to 122.5. The original expression is just a complicated way of saying 122.5.

The curious thing about sigma levels is that we are doing just the opposite: going from something that is easy to understand (percent defective) to something that is difficult to understand (sigma levels), and arguably makes little sense anyway.

In defence of sigma levels you might say that defect levels are typically very small, and it is easy to get confused about very small numbers. The numbers 0.0001% and 0.001% may look similar, but one is ten times as big as the other: if the defect in question leads to the death of a patient, for example, the second figure implies ten times as many deaths as the first. Which does matter. But the obvious way round this is to use something like the defectives per million opportunities (DPMO) as in the above table - the comparison then is between 1 defective and 10 defectives. In sigma levels the comparison is between 6.25 and 5.76 - but there is no easy interpretation of this except that the first number is larger than the second implying that first represents a greater unlikelihood than the other. There is no way of seeing that deaths are ten times as likely in the second scenario which the DPMO figures make very clear.

So why sigma levels?  The charitable explanation is that it's the legacy of many years of calculating probabilities by working with sigmas (standard deviations) so that the two concepts become inseparable. Except of course, that for non-statisticians they aren't connected at all: one is obviously meaningful and the other is gibberish.

The less charitable explanation is that it's a plot to mystify the uninitiated and keep them dependent on expensive experts.

Is it stupidity or a deliberate plot? Cock-up or conspiracy? In general I think I favour the cock-up theory, partly because it isn't only the peddlars of the Six Sigma doctrine who are wedded to sigma mystification. The traditional way of expressing quality levels is the capability index cpk - this is another convoluted way of converting something which is obvious into something which is far from obvious. The rot had set in long before Six Sigma.

And it's not just quality control. When the Higgs Boson particle was finally detected by physics researchers at CERN, the announcement was accompanied by a sigma level to express their degree of confidence that the alternative hypothesis that the results were purely a matter of chance could be ruled out:
"...with a statistical significance of five standard deviations (5 sigma) above background expectations. The probability of the background alone fluctuating up by this amount or more is about one in three million" (from the CERN website in April 2015. The sigma level here does not involve the arbitrary input of 1.5 in the Excel formulae above: this should be replaced by 0 to get the CERN results.)

Why bother with the sigma level? The one in three million figure surely expresses it far more simply and far more clearly.