Thursday, December 20, 2012

The turnover and performance of statisticians: a critique of some quantitative research

In an earlier post I criticized a qualitative research paper on the grounds that it uses unnecessarily convoluted language, the fact that some of the conclusions are blindingly obvious, the vagueness of the sample on which the research is based, the lack of any real information about whether the conclusions always apply or sometimes apply (and if so how often), or any satisfactory audit trail to link the conclusions with the data on which they are supposedly based. And the conclusions are so vague it’s actually difficult to see what they are. This post concerns a statistical research paper and many of my conclusions are very similar, but with a few important differences. I think both papers convey almost nothing of interest to any reasonable person.
                I wanted to choose a typical quantitative management research paper, look at it in detail, decide what was good and bad about it, and how it could be improved. The paper I chose was Is high employee turnover really harmful? An empirical test using company records published in the top ranking Academy of Management Journal. This seemed only moderately complicated in terms of the statistics used, it’s clearly written, and the topic is easy for the uninitiated, like me, to appreciate.
                The paper is about the relationship between employee turnover and the financial performance of organizations. They “tested the hypothesis that employee turnover and firm performance have an inverted U-shaped relationship: overly high or low turnover is harmful” and concluded that the hypothesis seemed to be true “but the inverted U-shape was not observed with certainty”.
                The first thing to note is that they used a single case study. One hundred and ten branches of just one employment agency in the Netherlands. A sample of one! They do admit that this is a problem, but doesn’t it undermine the whole project? The pattern in web businesses in California, or coffee bars in Portsmouth, would doubtless be very different.
                The second trivial objection is that the only reason they didn’t observe the inverted U-shape with certainty is that they didn’t look. There is no graph in the article (strange given that the article is about the shape of a graph): the graph they should have shown (on page 3 of this article) does show an inverted U-shape, although not a very convincing one because of the lack of offices with low levels of staff turnover. This is the shape they observed and, unless they are lying about their data, this was observed with certainty. Their admission of uncertainty is unnecessary because it is wrong!
                What they meant to say is that extrapolating the conclusion to the rest of the population cannot be certain. But what is the rest of the population? The employment agency only has 110 branches, and using the data to extrapolate conclusions to coffee bars in Portsmouth in 2012 is obviously silly. What they must mean is similar branches in similar organizations – but this is inevitably a little hazy.
                This brings us to the language used in the article. No problems with the text which is written in good, clear English (perhaps because the authors aren’t English). The difficulty is the statistical jargon – see Tables 2 and 3 which summarize the analysis of the data. For example, the lack of certainty in the extrapolation of the results is measured by two “p values”. I think it is impossible to explain these in simple terms, and they also don’t provide a clear answer – hence the rather vague “not observed with certainty”. I have reanalyzed their data with another method (Bootstrapping confidence levels ...) which does yield a clear answer – namely that there is a 65% probability that the hypothesis applies to the wider population. This makes it quite clear just how inconclusive the result is.
                The standard approach to analysis adopted by the authors uses jargon and methods which are difficult to decipher, and fails to provide the answers you want. Why don’t statisticians use simpler concepts which provide answers to the questions we need answers to? I wish I knew!
                There are also difficulties with the idea of hypothesis testing here. Let’s take another example for illustration: academic statisticians in a university. The hypothesis says that a very high turnover of statisticians is likely to be harmful to the university’s business. Research contracts won’t be secured, and students will be unhappy if the academics leave after only a few months. Let’s now imagine the opposite scenario where statisticians never leave. The students get the same lecture that’s been served up for the last 30 years, and the lack of fresh input means that the same tired ideas get used in research projects – which again is likely to lead to poor performance. Almost inevitably, there is likely to be an optimum level of turnover which allows new ideas to filter in but gives sufficient continuity to keep the system working well. In short, if we plotted a graph of performance against statistician turnover it would be low for very low and very high turnover; in other words it would be an inverted U-shape. But this is almost inevitably true. Research is not needed any more than research is needed to demonstrate there is an optimum level of food intake: starvation and obesity are both harmful.
                The hypothesis is too obviously true to be worth researching.
                Despite this it might be worth looking at, say, the optimum level of staff turnover in different industries. This might be 50% (per year) for academic statisticians to keep new ideas and contacts flowing in, but perhaps 10% with a coffee bar to help establish rapport with the clientele. Who knows? Research is needed. Quantitative research in management and similar disciplines is often strangely non-numerical in that the focus is on hypotheses – is this true or false? – instead of potentially more interesting questions like how much difference does this make?
                But … in this instance would this really be worthwhile? There are many other variables impacting on performance (as the data in the original article shows), and it is difficult to imagine a sensible manager saying “we need to get rid of so-and-so because our turnover rates are a bit down”. The detail of the individual cases would be a far more sensible guide than any slight tendency for performance and turnover to be related by an inverted U-shaped pattern, so is any version of this research really of any use? Except to statisticians at a loose end?
                I am an academic statistician in a university in the sense that a lot of the work that comes my way concerns statistics (although I cannot claim any expertise in the hard core of the discipline), and I’ve been doing it for more than 20 years. This is definitely too long. Glebbeek and Bax’s argument might be rather silly, but their implicit conclusion that every occupation has an optimum timespan, and that I am past my best before date, is right. After Christmas, I am moving on to pastures new for everyone’s sake.
                To return to the comparison with the qualitative research paper, my conclusions are similar in many ways. The statistical jargon is unnecessarily convoluted, the hypothesis is obviously true and so not worth testing, the sample is not vague but it is very limited, and it is difficult to discern any useful conclusions. On the positive side it is very obvious how the analysis was conducted; it’s just that I don’t think it was worth conducting.
                This, of course, is a sample of just one quantitative research article. Obviously I can’t confidently generalize to all other quantitative research articles. But I am fairly sure that very similar points apply to a substantial number of them. There is a more detailed version of this argument in my article Making statistical methods in management more useful ....

No comments: