The Sceptical Academic: 2012

Monday, December 24, 2012

Marking projects

Just marking yet another project. As every marker should be aware, it is, of course, not a criterion for a good project that it should make sense. That really would be expecting a bit much! All I check for is reasonable compliance to the conventions of projects – complete and properly formatted references, some discussion of methodology, some aims and conclusions, and something that could count analysis, whatever that means.

What have we got today? The one I’ve just read is on leadership and the third age. Usual sort of rubbish. Skims through the platitudes from the gurus: leadership is obviously a good thing, best if it’s authentic, etc, etc. I’m really not sure what this has to do with the third age, but let’s read on and see where we get to. Next we get the usual platitudes about using a qualitative approach, before presenting a purely quantitative list of average responses to questions asking for opinions of the state of play on a 1 to 5 scale. “How important do you think leadership is?” And lots of similar questions. These averages seem to be the answer. This is obviously a good student because she’s thought of working out averages. Usually you just get bar charts with bars representing strongly agree, agree etc – completely impossible to take in.

But what’s this got to do with the third age? I’m really not sure, except that the respondents to her questionnaire were third agers. But what leaders are they talking about? Who is being led? Who cares?

This project is very typical. The approach seems to be to start with a topic – in this case leadership and the third age. Then you ask a haphazard assortment of people their opinions about the topic, usually on a 1-5 scale, after a little essay on qualitative research and not reducing people to numbers. Obviously the haphazard assortment of people will know little about the topic, and certainly won’t put any thought into answering the questions. Then you take the average of these opinions, and present this as the answer. Why does anybody think this makes any sense?

Or, at least I think that’s the way it works. I couldn’t face reading most of it. But there were a couple of aims, and couple of conclusions which sort of linked up with the aims, a questionnaire, a few graphs, and a list of references … so it must be OK. It will get a good pass mark.

Thursday, December 20, 2012

The turnover and performance of statisticians: a critique of some quantitative research

In an earlier post I criticized a qualitative research paper on the grounds that it uses unnecessarily convoluted language, the fact that some of the conclusions are blindingly obvious, the vagueness of the sample on which the research is based, the lack of any real information about whether the conclusions always apply or sometimes apply (and if so how often), or any satisfactory audit trail to link the conclusions with the data on which they are supposedly based. And the conclusions are so vague it’s actually difficult to see what they are. This post concerns a statistical research paper and many of my conclusions are very similar, but with a few important differences. I think both papers convey almost nothing of interest to any reasonable person.

I wanted to choose a typical quantitative management research paper, look at it in detail, decide what was good and bad about it, and how it could be improved. The paper I chose was Is high employee turnover really harmful? An empirical test using company records published in the top ranking Academy of Management Journal. This seemed only moderately complicated in terms of the statistics used, it’s clearly written, and the topic is easy for the uninitiated, like me, to appreciate.

The paper is about the relationship between employee turnover and the financial performance of organizations. They “tested the hypothesis that employee turnover and firm performance have an inverted U-shaped relationship: overly high or low turnover is harmful” and concluded that the hypothesis seemed to be true “but the inverted U-shape was not observed with certainty”.

The first thing to note is that they used a single case study. One hundred and ten branches of just one employment agency in the Netherlands. A sample of one! They do admit that this is a problem, but doesn’t it undermine the whole project? The pattern in web businesses in California, or coffee bars in Portsmouth, would doubtless be very different.

The second trivial objection is that the only reason they didn’t observe the inverted U-shape with certainty is that they didn’t look. There is no graph in the article (strange given that the article is about the shape of a graph): the graph they should have shown (on page 3 of this article) does show an inverted U-shape, although not a very convincing one because of the lack of offices with low levels of staff turnover. This is the shape they observed and, unless they are lying about their data, this was observed with certainty. Their admission of uncertainty is unnecessary because it is wrong!

What they meant to say is that extrapolating the conclusion to the rest of the population cannot be certain. But what is the rest of the population? The employment agency only has 110 branches, and using the data to extrapolate conclusions to coffee bars in Portsmouth in 2012 is obviously silly. What they must mean is similar branches in similar organizations – but this is inevitably a little hazy.

This brings us to the language used in the article. No problems with the text which is written in good, clear English (perhaps because the authors aren’t English). The difficulty is the statistical jargon – see Tables 2 and 3 which summarize the analysis of the data. For example, the lack of certainty in the extrapolation of the results is measured by two “p values”. I think it is impossible to explain these in simple terms, and they also don’t provide a clear answer – hence the rather vague “not observed with certainty”. I have reanalyzed their data with another method (Bootstrapping confidence levels ...) which does yield a clear answer – namely that there is a 65% probability that the hypothesis applies to the wider population. This makes it quite clear just how inconclusive the result is.

The standard approach to analysis adopted by the authors uses jargon and methods which are difficult to decipher, and fails to provide the answers you want. Why don’t statisticians use simpler concepts which provide answers to the questions we need answers to? I wish I knew!

There are also difficulties with the idea of hypothesis testing here. Let’s take another example for illustration: academic statisticians in a university. The hypothesis says that a very high turnover of statisticians is likely to be harmful to the university’s business. Research contracts won’t be secured, and students will be unhappy if the academics leave after only a few months. Let’s now imagine the opposite scenario where statisticians never leave. The students get the same lecture that’s been served up for the last 30 years, and the lack of fresh input means that the same tired ideas get used in research projects – which again is likely to lead to poor performance. Almost inevitably, there is likely to be an optimum level of turnover which allows new ideas to filter in but gives sufficient continuity to keep the system working well. In short, if we plotted a graph of performance against statistician turnover it would be low for very low and very high turnover; in other words it would be an inverted U-shape. But this is almost inevitably true. Research is not needed any more than research is needed to demonstrate there is an optimum level of food intake: starvation and obesity are both harmful.

The hypothesis is too obviously true to be worth researching.

Despite this it might be worth looking at, say, the optimum level of staff turnover in different industries. This might be 50% (per year) for academic statisticians to keep new ideas and contacts flowing in, but perhaps 10% with a coffee bar to help establish rapport with the clientele. Who knows? Research is needed. Quantitative research in management and similar disciplines is often strangely non-numerical in that the focus is on hypotheses – is this true or false? – instead of potentially more interesting questions like how much difference does this make?

But … in this instance would this really be worthwhile? There are many other variables impacting on performance (as the data in the original article shows), and it is difficult to imagine a sensible manager saying “we need to get rid of so-and-so because our turnover rates are a bit down”. The detail of the individual cases would be a far more sensible guide than any slight tendency for performance and turnover to be related by an inverted U-shaped pattern, so is any version of this research really of any use? Except to statisticians at a loose end?

I am an academic statistician in a university in the sense that a lot of the work that comes my way concerns statistics (although I cannot claim any expertise in the hard core of the discipline), and I’ve been doing it for more than 20 years. This is definitely too long. Glebbeek and Bax’s argument might be rather silly, but their implicit conclusion that every occupation has an optimum timespan, and that I am past my best before date, is right. After Christmas, I am moving on to pastures new for everyone’s sake.

To return to the comparison with the qualitative research paper, my conclusions are similar in many ways. The statistical jargon is unnecessarily convoluted, the hypothesis is obviously true and so not worth testing, the sample is not vague but it is very limited, and it is difficult to discern any useful conclusions. On the positive side it is very obvious how the analysis was conducted; it’s just that I don’t think it was worth conducting.

This, of course, is a sample of just one quantitative research article. Obviously I can’t confidently generalize to all other quantitative research articles. But I am fairly sure that very similar points apply to a substantial number of them. There is a more detailed version of this argument in my article Making statistical methods in management more useful ....

Thursday, December 13, 2012

Open access, peer review and a Journal of Everything

Open access journals are obviously a good thing for lots of very obvious reasons, and I’m sure they will take over the academic publishing scene at some stage, if only because it’s now so easy to post freely available copies of papers on the web. But there are many other things wrong with the peer reviewed journal system, and the shake up that the spread of open access is likely to prompt may provide opportunities to deal with these too. Those that occur to me at the moment include, in no particular order, the following:

The plethora of different journals makes life difficult for both readers and authors, and must discourage interdisciplinary research which may not conform to the prejudices of any particular specialist area. What is needed is a Journal of everything. There are various public repositories of papers (e.g. SSRN, arXiv.org) which are not peer reviewed, and Sage Open is a peer review journal which spans “the full spectrum of the social and behavioral sciences and the humanities”, but there is as yet no truly general journal. Journals on things like Football Studies made a lot of sense in the days when the devotees of each area received their monthly issue through the post, and could rely on the journal to keep them up to date with everything happening in their area. The web has now changed this: we don’t need research bundling into journals, but instead we simply search for the relevant papers wherever they happen to be. Specialist journals, each with own little idiosyncrasies, are no longer necessary.

Peer review is just what it says it is: review by researchers in the same area, often readers of the same journal. What academic journals don’t get is end user or customer reviews. The predictable result is that published research gets more and more inward looking as authors are forced to conform to the, often arbitrary and unreasonable, prejudices of the particular discipline. This is how Kuhn’s normal science works and it does make sense in established disciplines where the paradigm is a productive one worth pursuing. In areas where there is no such paradigm (management, education?), the conformity enforced by peer reviewers must be bad.

Academic journals usually use two or three peer reviewers for each article, but readers never get to know what the reviewers check for. Are they checking the statistics? The grammar and style? Acknowledgement of previous results? The research design? The usefulness of the research? The originality? Whether the results can be trusted? If an article is on the safety and effectiveness of a new medical treatment, we would probably want to be reassured that the statistics are right, and the conclusions are as trustworthy as possible. If, on the other hand, the idea is to encourage new ideas, not necessarily fully tested, then much looser evaluation criteria would be relevant. But we do need to know whether something is in a journal because the editor and reviewers think it’s right and useful, or because they think it’s an interesting idea which may or may not be worth pursuing. The journal Medical Hypotheses, for example “will publish interesting and important theoretical papers that foster the diversity and debate upon which the scientific process thrives”. Problems only arise (see Wikipedia article) when this sort of review is confused with a review which says an article should be believed.

I think the answer to these problems, and many others, is to establish a Journal of Everything, and then encourage a market in reviewing systems. The Society for Medical Trials might establish a reviewing system and give their stamp of approval to papers which meet their criteria for valid trials of medical procedures. Another body might provide a certificate that a paper is interesting and well written, and another body might check the statistics. In each case their reputation will depend on readers’ accepting their verdicts, and as new requirements arise new reviewing systems would be created to fill the need. I’ve fleshed this out in slightly more detail in an article about the Journal of everything.

Wednesday, November 28, 2012

Peer regard for pot noodles: a critique of some qualitative research

Pot Noodles, Placements and Peer Regard: Creative Career Trajectories and Communities of Practice in the British Advertising Industry was published in the prestigious British Journal of Management in 2010. I chose it for the title, which looked fun, so from the point of view of the quality of the research, which is what I am interested in here, it is a more or less random choice. I think it is typical of many “qualitative” empirical papers in the literature.

“They can’t be serious,” was the first reaction of one of the students I asked to look critically at this paper. But of course they are, unfortunately. This paper does read like a parody of itself. The main difficulties are the unnecessarily convoluted language, the fact that some of the conclusions are blindingly obvious, the vagueness of the sample on which the research is based, the lack of any real information about whether the conclusions always apply or sometimes apply (and if so how often), or indeed any satisfactory audit trail to link the conclusions with the data on which they are supposedly based. And the conclusions are so vague it’s actually difficult to see what they are.

In short, I think this paper is a complete waste of time. Which is a pity, because a study of careers in the advertising industry could be very interesting and useful from the perspective of various stakeholders – the advertising creatives themselves, those managing the agencies, and society in general. Furthermore very similar issues are likely to apply to other business areas. The topic is a good one, worth researching.

The initial problem from the students’ point of view is the language used. For example: “People learn by participating in the shared practices of a community or ‘lived in world’ (Fuller and Unwin, 1998). A community of practice is a collectively developed understanding of the nature and identity of the community to which its members are accountable. It is sustained through norms and relationships of mutuality and a shared repertoire of communal resources, language routines, artefacts, tools and stories. Those who are involved in a particular ‘community’ understand its limits or boundaries. What is learned by participants within a community is identity formation, rather than knowledge per se: according to Lave and Wenger (1991, p. 53), ‘[l]earning implies becoming a different person with respect to the possibilities enabled by these systems of relations’. It is the gradual construction of an identity and learning to talk within a practice (rather than about it) that allows the novice to become part of a community.” (p. 115)

Does this mean “a community has its own know-how, jargon, assumptions and habits which people pick up as they participate”? If so why not use simpler language like this? And isn’t it actually rather obvious? Any group has its own jargon and idiosyncracies which beginners need to learn. We all know this. We don’t need research to tell us. Isn’t this just translating the obvious into long words so we don’t realize it’s obvious? Or have I misunderstood because the words are too long for me?

What conclusions does the article come to? The first paragraph of the concluding section is

“The focus of the paper has been on how career trajectories unfold within the advertising industry as a community of practice. It is within this context that anticipatory socialization, situated learning, and the move from periphery to centre takes place. Advertising's creative community is not simply a backdrop for learning: individuals created and developed the community through social action, and this inter-relationship was crucial in shaping career trajectories.”

And the last sentence of the article is:

“Nonetheless, this study of advertising creatives suggests that, although modern careers may be individualized undertakings, they increasingly unfold within norms and practices of multiple, inter-related occupational communities.”

These extracts give a good flavour of the style of conclusions derived from the research. But … surely all careers “unfold” within “multiple, inter-related occupational communities”, all communities are created by individuals “through social action”, and people with successful careers usually have little choice but to start on the periphery of their chosen field and then try to move to the centre. And so on. In short, how could things be otherwise? We really do not need research to tell us this!

Yes, but, you might say, perhaps advertising creatives’ careers involve more interaction with multiple communities, and “anticipatory socialization” and “situated learning” are more important than they are in other careers. Perhaps. We are not told. These things are not measured so there is no possibility of comparison. Besides their obviousness, the lack of any indication of the magnitude of any of the features described, or their importance compared with other contexts, makes the conclusions too vague to be interesting.

We also really need more detail of how the research was done. A sample of 34 creatives was interviewed, which is fair enough. But who were they? We are told the sampling was “informed” by industry knowledge, and various selection criteria to get a sample of creatives in “different situations”. But we are not told how this was done, or any of the details that are necessary to understand the composition of the sample. A different sample would lead to different conclusions: the reader needs to be convinced that the sample is in some sense typical. This reader is not remotely close to being convinced.

Similarly, we are told about open coding, being “mindful of our theory”, a small set of key categories, theoretical codes, etc, etc. All the usual grounded theory stuff, but without any more detail to tell the reader what it actually means. And then we go straight into a narrative with statements like “creatives who had come into advertising through such specialist courses talked about the value of what they had learned in terms of craft skills, immersion in the community and insights into its employment patterns and work practices.” Does this mean all of them? Were they right? And where is the evidence, the codes, etc, etc? I find it hard to believe the grounded theory stuff was actually carried out, and harder still to believe it had any value. There’s certainly no trace of it in the paper.

Peer reviewed journal papers: are there any good ones?

Had a class yesterday when I got the MBA students to do critiques of papers in academic management journals with a view to making them more critical of what they read, and appreciating some of the problems they are likely to encounter in their own research.

First problem was to choose some suitable papers: hopefully some good, some bad and some OK but with obvious possibilities for improvement. Unfortunately I really couldn’t find any good ones. It’s easy enough to find examples of bad articles (one of my favourites is Can total quality management make small firms competitive?), but I haven’t managed to find any good articles on ordinary empirical research in management. There doubtless are articles putting forward interesting models and theories, and there are good articles in other fields like medicine, but I have yet to find what I consider a good published example of the kind of empirical research we expect our MBA students to do.

Last time I ran this session one of the student complained, very reasonably, that I had not provided an example of good practice to follow, so I sent round a circular to my colleagues asking for suggestions, but none of their suggestions really fitted the bill. So I’m still trying to teach the students how to do good research, but without any models of good practice. My ideas about how research in this area should be done are so different from what is actually done that I can’t mention what I really think for fear of undermining the literature that am supposed to be encouraging the students to study. Help!

Empirical papers in the management research literature usually come in one of two styles – often called qualitative and quantitative. I don’t think this is a sensible or useful distinction (see Are ‘Qualitative’ and ‘Quantitative’ Useful Terms for Describing Research?) but it is the way the literature divides up, so I’ll deal with the two types separately in two future posts. And in a further post I’ll try and say what I think management research should look like.