The Sceptical Academic: December 2012

Monday, December 24, 2012

Marking projects

Just marking yet another project. As every marker should be aware, it is, of course, not a criterion for a good project that it should make sense. That really would be expecting a bit much! All I check for is reasonable compliance to the conventions of projects – complete and properly formatted references, some discussion of methodology, some aims and conclusions, and something that could count analysis, whatever that means.

What have we got today? The one I’ve just read is on leadership and the third age. Usual sort of rubbish. Skims through the platitudes from the gurus: leadership is obviously a good thing, best if it’s authentic, etc, etc. I’m really not sure what this has to do with the third age, but let’s read on and see where we get to. Next we get the usual platitudes about using a qualitative approach, before presenting a purely quantitative list of average responses to questions asking for opinions of the state of play on a 1 to 5 scale. “How important do you think leadership is?” And lots of similar questions. These averages seem to be the answer. This is obviously a good student because she’s thought of working out averages. Usually you just get bar charts with bars representing strongly agree, agree etc – completely impossible to take in.

But what’s this got to do with the third age? I’m really not sure, except that the respondents to her questionnaire were third agers. But what leaders are they talking about? Who is being led? Who cares?

This project is very typical. The approach seems to be to start with a topic – in this case leadership and the third age. Then you ask a haphazard assortment of people their opinions about the topic, usually on a 1-5 scale, after a little essay on qualitative research and not reducing people to numbers. Obviously the haphazard assortment of people will know little about the topic, and certainly won’t put any thought into answering the questions. Then you take the average of these opinions, and present this as the answer. Why does anybody think this makes any sense?

Or, at least I think that’s the way it works. I couldn’t face reading most of it. But there were a couple of aims, and couple of conclusions which sort of linked up with the aims, a questionnaire, a few graphs, and a list of references … so it must be OK. It will get a good pass mark.

Thursday, December 20, 2012

The turnover and performance of statisticians: a critique of some quantitative research

In an earlier post I criticized a qualitative research paper on the grounds that it uses unnecessarily convoluted language, the fact that some of the conclusions are blindingly obvious, the vagueness of the sample on which the research is based, the lack of any real information about whether the conclusions always apply or sometimes apply (and if so how often), or any satisfactory audit trail to link the conclusions with the data on which they are supposedly based. And the conclusions are so vague it’s actually difficult to see what they are. This post concerns a statistical research paper and many of my conclusions are very similar, but with a few important differences. I think both papers convey almost nothing of interest to any reasonable person.

I wanted to choose a typical quantitative management research paper, look at it in detail, decide what was good and bad about it, and how it could be improved. The paper I chose was Is high employee turnover really harmful? An empirical test using company records published in the top ranking Academy of Management Journal. This seemed only moderately complicated in terms of the statistics used, it’s clearly written, and the topic is easy for the uninitiated, like me, to appreciate.

The paper is about the relationship between employee turnover and the financial performance of organizations. They “tested the hypothesis that employee turnover and firm performance have an inverted U-shaped relationship: overly high or low turnover is harmful” and concluded that the hypothesis seemed to be true “but the inverted U-shape was not observed with certainty”.

The first thing to note is that they used a single case study. One hundred and ten branches of just one employment agency in the Netherlands. A sample of one! They do admit that this is a problem, but doesn’t it undermine the whole project? The pattern in web businesses in California, or coffee bars in Portsmouth, would doubtless be very different.

The second trivial objection is that the only reason they didn’t observe the inverted U-shape with certainty is that they didn’t look. There is no graph in the article (strange given that the article is about the shape of a graph): the graph they should have shown (on page 3 of this article) does show an inverted U-shape, although not a very convincing one because of the lack of offices with low levels of staff turnover. This is the shape they observed and, unless they are lying about their data, this was observed with certainty. Their admission of uncertainty is unnecessary because it is wrong!

What they meant to say is that extrapolating the conclusion to the rest of the population cannot be certain. But what is the rest of the population? The employment agency only has 110 branches, and using the data to extrapolate conclusions to coffee bars in Portsmouth in 2012 is obviously silly. What they must mean is similar branches in similar organizations – but this is inevitably a little hazy.

This brings us to the language used in the article. No problems with the text which is written in good, clear English (perhaps because the authors aren’t English). The difficulty is the statistical jargon – see Tables 2 and 3 which summarize the analysis of the data. For example, the lack of certainty in the extrapolation of the results is measured by two “p values”. I think it is impossible to explain these in simple terms, and they also don’t provide a clear answer – hence the rather vague “not observed with certainty”. I have reanalyzed their data with another method (Bootstrapping confidence levels ...) which does yield a clear answer – namely that there is a 65% probability that the hypothesis applies to the wider population. This makes it quite clear just how inconclusive the result is.

The standard approach to analysis adopted by the authors uses jargon and methods which are difficult to decipher, and fails to provide the answers you want. Why don’t statisticians use simpler concepts which provide answers to the questions we need answers to? I wish I knew!

There are also difficulties with the idea of hypothesis testing here. Let’s take another example for illustration: academic statisticians in a university. The hypothesis says that a very high turnover of statisticians is likely to be harmful to the university’s business. Research contracts won’t be secured, and students will be unhappy if the academics leave after only a few months. Let’s now imagine the opposite scenario where statisticians never leave. The students get the same lecture that’s been served up for the last 30 years, and the lack of fresh input means that the same tired ideas get used in research projects – which again is likely to lead to poor performance. Almost inevitably, there is likely to be an optimum level of turnover which allows new ideas to filter in but gives sufficient continuity to keep the system working well. In short, if we plotted a graph of performance against statistician turnover it would be low for very low and very high turnover; in other words it would be an inverted U-shape. But this is almost inevitably true. Research is not needed any more than research is needed to demonstrate there is an optimum level of food intake: starvation and obesity are both harmful.

The hypothesis is too obviously true to be worth researching.

Despite this it might be worth looking at, say, the optimum level of staff turnover in different industries. This might be 50% (per year) for academic statisticians to keep new ideas and contacts flowing in, but perhaps 10% with a coffee bar to help establish rapport with the clientele. Who knows? Research is needed. Quantitative research in management and similar disciplines is often strangely non-numerical in that the focus is on hypotheses – is this true or false? – instead of potentially more interesting questions like how much difference does this make?

But … in this instance would this really be worthwhile? There are many other variables impacting on performance (as the data in the original article shows), and it is difficult to imagine a sensible manager saying “we need to get rid of so-and-so because our turnover rates are a bit down”. The detail of the individual cases would be a far more sensible guide than any slight tendency for performance and turnover to be related by an inverted U-shaped pattern, so is any version of this research really of any use? Except to statisticians at a loose end?

I am an academic statistician in a university in the sense that a lot of the work that comes my way concerns statistics (although I cannot claim any expertise in the hard core of the discipline), and I’ve been doing it for more than 20 years. This is definitely too long. Glebbeek and Bax’s argument might be rather silly, but their implicit conclusion that every occupation has an optimum timespan, and that I am past my best before date, is right. After Christmas, I am moving on to pastures new for everyone’s sake.

To return to the comparison with the qualitative research paper, my conclusions are similar in many ways. The statistical jargon is unnecessarily convoluted, the hypothesis is obviously true and so not worth testing, the sample is not vague but it is very limited, and it is difficult to discern any useful conclusions. On the positive side it is very obvious how the analysis was conducted; it’s just that I don’t think it was worth conducting.

This, of course, is a sample of just one quantitative research article. Obviously I can’t confidently generalize to all other quantitative research articles. But I am fairly sure that very similar points apply to a substantial number of them. There is a more detailed version of this argument in my article Making statistical methods in management more useful ....

Thursday, December 13, 2012

Open access, peer review and a Journal of Everything

Open access journals are obviously a good thing for lots of very obvious reasons, and I’m sure they will take over the academic publishing scene at some stage, if only because it’s now so easy to post freely available copies of papers on the web. But there are many other things wrong with the peer reviewed journal system, and the shake up that the spread of open access is likely to prompt may provide opportunities to deal with these too. Those that occur to me at the moment include, in no particular order, the following:

The plethora of different journals makes life difficult for both readers and authors, and must discourage interdisciplinary research which may not conform to the prejudices of any particular specialist area. What is needed is a Journal of everything. There are various public repositories of papers (e.g. SSRN, arXiv.org) which are not peer reviewed, and Sage Open is a peer review journal which spans “the full spectrum of the social and behavioral sciences and the humanities”, but there is as yet no truly general journal. Journals on things like Football Studies made a lot of sense in the days when the devotees of each area received their monthly issue through the post, and could rely on the journal to keep them up to date with everything happening in their area. The web has now changed this: we don’t need research bundling into journals, but instead we simply search for the relevant papers wherever they happen to be. Specialist journals, each with own little idiosyncrasies, are no longer necessary.

Peer review is just what it says it is: review by researchers in the same area, often readers of the same journal. What academic journals don’t get is end user or customer reviews. The predictable result is that published research gets more and more inward looking as authors are forced to conform to the, often arbitrary and unreasonable, prejudices of the particular discipline. This is how Kuhn’s normal science works and it does make sense in established disciplines where the paradigm is a productive one worth pursuing. In areas where there is no such paradigm (management, education?), the conformity enforced by peer reviewers must be bad.

Academic journals usually use two or three peer reviewers for each article, but readers never get to know what the reviewers check for. Are they checking the statistics? The grammar and style? Acknowledgement of previous results? The research design? The usefulness of the research? The originality? Whether the results can be trusted? If an article is on the safety and effectiveness of a new medical treatment, we would probably want to be reassured that the statistics are right, and the conclusions are as trustworthy as possible. If, on the other hand, the idea is to encourage new ideas, not necessarily fully tested, then much looser evaluation criteria would be relevant. But we do need to know whether something is in a journal because the editor and reviewers think it’s right and useful, or because they think it’s an interesting idea which may or may not be worth pursuing. The journal Medical Hypotheses, for example “will publish interesting and important theoretical papers that foster the diversity and debate upon which the scientific process thrives”. Problems only arise (see Wikipedia article) when this sort of review is confused with a review which says an article should be believed.

I think the answer to these problems, and many others, is to establish a Journal of Everything, and then encourage a market in reviewing systems. The Society for Medical Trials might establish a reviewing system and give their stamp of approval to papers which meet their criteria for valid trials of medical procedures. Another body might provide a certificate that a paper is interesting and well written, and another body might check the statistics. In each case their reputation will depend on readers’ accepting their verdicts, and as new requirements arise new reviewing systems would be created to fill the need. I’ve fleshed this out in slightly more detail in an article about the Journal of everything.