A news story currently being covered here in the UK concerns serious serial abuses by a healthcare worker. One of the questions being addressed in ongoing investigations concerns why managers of the healthcare facility where the tragic events took place didn’t intervene sooner to remove the now-convicted murderer. I have no authoritative insights into the case itself, but one familiar phrase came across in testimony that I read from one of the managers involved: “There was no evidence…”[1]. This phrase is routinely used in questions of justice and responsibility[2], and I think it’s quite problematic.

The issue is simply that “absence of evidence is not evidence of absence”, as the quip goes. It’s obvious enough that the mere lack of evidence about some allegation isn’t in itself conclusive, yet the trope “there’s no evidence” can still be rhetorically effective in dismissing an allegation. Where does its force come from, and why is it so widely used? I believe it owes much to the dominance of the 20th-century school of statistics known as Frequentism and its easy null-hypothesis tests. Let me explain.

Frequentism is an approach to probability that says the way to find the probability of any kind of event is to observe how many times it occurs out of a given number of similar chances (trials). It’s a useful and powerful approach, so long as we can say what count as ‘similar’ trials (tossing a coin is the textbook example). Now, it’s all very well to have an easy way of assessing probabilities just by counting, but a more realistic problem is typically: given some numerical measurements (data), what should we conclude about the underlying average value or rate, and how probable is our conclusion? Now we can’t just count; we need to process the data to infer what kind of process they came from, and how confident we should be about our inference. Here, the basic frequentist approach is the null-hypothesis test (NHT). I’ve written about this before: in brief, you define a default ‘null’ hypothesis that, if true, would have no import; then you feed your data into a piece of software that tells you whether to reject your null hypothesis on the grounds that it makes the probability of your data too small – or not.

Every invention has its uses. The beauty of the NHT is that it provides an easy, quasi-objective way to make a decision based on data. There’s even a conventional definition of “too small” for the probability: a so-called P-value below 0.05 is the standard threshold for rejecting any null hypothesis, drawing a conclusion and taking the corresponding action. If P > 0.05, you don’t have a conclusion, and you shouldn’t take any action.

After using more NHTs than I care to count in my scientific work to date, I can personally vouch for the sense of anticlimax that accompanies a P-value above 0.05. Especially if one has put a lot of effort into collecting a set of data, it’s galling to find that one simply has no conclusion. [3] However, there are also cases where an investigator doesn’t want to reject a null hypothesis. Does your product cause health problems? Do your executives waste money? Is your employee acting dangerously? Rather than doing the proper thing and switching around the null hypothesis (e.g. to “there are health problems in at least 1% of patients”; “our executives waste 5% or more…”), users may be tempted by a lazy conclusion. “I did a test, and there was no evidence of X”.[4]

There are multiple problems with this “no evidence” claim. First, the impersonal phrasing suggests absolute non-existence (rather than “I could find no evidence”). Second, whereas “no evidence” might suggest a big round 0, no such value is really implied. We might assume the threshold of P=5% was used (1% is also common, but the test would be meaningless at a 0% threshold). Another problem is that it gives no indication of how much effort was put into seeking evidence. Naturally, the fewer data you have, the less likely you are to reject a null hypothesis. Opportunities for gaming the NHT approach are only too clear!

The underlying problem, it seems to me, is a one-sided reduction of our belief-forming behaviour. Knowledge – like sense perception – inherently has a subject side and an object side. We come to know things as our minds (as subject) are shaped in correspondence to features of the rest of the world (as object). This means that the beliefs we bring to any learning experience inevitably shape what we learn – and how we then act. By the same token, our (assumed) knowledge is always subject to correction. Andrew Hartley has argued that taking Frequentism as a paradigm for knowledge shows the hallmarks of a secular humanism trying to reason its way to impersonal certainty.

As I’ve discussed before on this blog and elsewhere, there are better, more human ways to do statistical inference concerning controversial questions. And there are certainly more fruitful ways to engage in fraught and controversial issues than the simplistic question, “Is there any evidence – or not?”

_______________________________

.

[1] In this case, the phrase was only alleged to have been used, and the person seems to have denied using it.

[2] When Emmanuel Macron referred to a certain COVID-19 vaccine as “quasi-ineffective” (https://www.bbc.co.uk/news/55919245), I think a similar kind of fallacy may have been in play.

[3] There are further steps one can take, of course. One can do a power analysis, asking “If the actual effect I’m looking for is of a certain size, what would then be the probability of rejecting my null hypothesis?” – and if that answer is a big enough probability (say, >0.8), tentatively drawing the conclusion that an effect of the postulated size probably doesn’t exist. But power analyses are rarely performed outside academic research.

[4] If this account of the “no evidence” fallacy seems too technical to account for the widespread (mis)use of this rhetorical device, I estimate that 50% of UK undergraduate degrees are in subjects where NHTs are usually taught (biological and health sciences, business studies, psychology, economics) – using data from https://www.hesa.ac.uk/data-and-analysis/students/table-46 . NHTs have been used ever-more widely even as their limitations are increasingly discussed.

Richard Gunton
Latest posts by Richard Gunton (see all)

Richard Gunton

Richard is the Director of Faith-in-Scholarship at Thinking Faith Network. He also teaches statistics at the University of Winchester. His current passions include Reformational philosophy, history of sciences, ordination (the statistical sort), and wildlife gardening. He worships, and occasionally preaches, at St Mary's Church in Portchester. [Views expressed here are his own.]