Wednesday, April 20, 2016

Why is the statistical package SPSS so unhelpful?

I've just run a statistical test on SPSS to see if there is a difference between articles in the Guardian and Telegraph in terms of Characteristic X (it doesn't matter what X is for my purposes here). The results are pasted below. The presence of X is coded as 1, and its absence by 0.

The first table shows that a higher proportion of Guardian articles (33.5%) than Telegraph articles (24.1%) had X. The second table addresses the issue of statistical significance: can we be sure that this is not a chance effect that would be unlikely to recur in another sample of articles?

Paper * Code Crosstabulation

Code
Total
.00
1.00
Paper
Guardian
Count
121
61
182
% within Paper
66.5%
33.5%
100.0%
Telegraph
Count
60
19
79
% within Paper
75.9%
24.1%
100.0%
Total
Count
181
80
261
% within Paper
69.3%
30.7%
100.0%


Chi-Square Tests

Value
df
Asymp. Sig. (2-sided)
Exact Sig. (2-sided)
Exact Sig. (1-sided)
Pearson Chi-Square
2.322a
1
.128
.145
.083
Continuity Correctionb
1.898
1
.168


Likelihood Ratio
2.386
1
.122
.145
.083
Fisher's Exact Test



.145
.083
N of Valid Cases
261




a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 24.21.


I decided I would like a two sided significance level, and looked at the second table to find it. Unfortunately there are no fewer than four different answers (0.128, 0.168, 0.122 and 0.145)! Which to choose?

Further study of the table only deepened my confusion. The heading is Chi-Square tests but two of the columns are headed Exact Sig. My understanding is that the chi-square test uses the chi-square distribution which is a well known way of working out approximate probabilities. The exact test works out the equivalent probabilities directly without using the chi-square distribution, so the entries in the Exact test columns are not chi-square results despite the table heading. One of the rows is headed Fisher Exact Test and another Pearson Chi-Square which seems to confirm this. But what can we make of the top right figure (0.083) which is Chi-square according to the table heading, Pearson Chi-Square according to the row heading, and Exact Sig according to the column heading? Help!

OK, I know I should have consulted the Help (it doesn't work on my computer so I can't), or a book on using SPSS, or gone on a course and provided employment for an expert. But I don't think this should be necessary. SPSS should produce clear  tablese A with a little explanation of what the numbers mean. In the present case, as exact probabilities can be computed surely this is all that's needed. With a sensible heading for the table, and a little note on what the probabilities represent.

SPSS should produce clear, consistent tables which present only the relevant information with an explanation in, as far as possible, non-technical language.

But then people might understand the output and the market for courses and experts would be much diminished.