The last six months has been a crash course in statistical analysis for many of us. We’ve been bombarded daily with numbers about Covid-19 cases and deaths, graphs and charts showing trends and comparisons, and sometimes even different interpretations of the same data being used to argue for conflicting courses of action. When so much rides on governments and individuals ‘doing the right thing’ to bring us through this global catastrophe, it’s no wonder that we find it stressful to navigate our way through the sea of information.
In some cases, data has been presented spuriously by authorities who could have done better. In others, it’s simply not possible to find out what we really need to know, so instead we rely on whatever data does exist when making decisions, with problematic consequences. Here are five of the ways in which widely cited data has failed to tell the whole truth about Coronavirus.
1: Data that didn’t care about care homes
After the first month of Lockdown, the UK had recorded around 22,000 deaths from Covid-19. On 29 April this figure suddenly jumped by almost 4000. The reason? Deaths outside of hospital settings had been included for the first time. In other words, around 15% of confirmed deaths up to this point hadn’t been counted because the people didn’t die in intensive care after a medically assisted ‘fight’, but in their own homes or in other institutional settings.
Of course, the data on hospital deaths gave an important insight into how the disease was progressing for a large segment of the population. But it entirely omitted another segment — care homes — whose particular characteristics might’ve warranted a targeted set of actions. The virus was likely to act differently in settings where many people have underlying conditions, PPE is in short supply and agency staff routinely travel from one home to another. Any assumption that the general trend of infections and deaths would simply ‘mirror’ the trend seen in the hospital data was surely a dangerous one. We lived with this danger for over three months and through the peak of the first wave, only discovering the care home data after deaths had started to fall. Significant numbers of care home deaths may never make it into the official records because people died without ever being tested for the virus.
2. The Extrapolation Conundrum
We’re not in a position to test the entire population for the virus regularly, so the only way to estimate case numbers is to test representative samples of the population and extrapolate from these numbers. The ONS have been doing this for some time, giving us an idea of what percentage of people might be infected in the country as a whole. When case numbers were very high, the regular survey gave a useful indication of the prevailing trends, even if the numbers could not be verified in any absolute sense.
The problem with this method is that during the summer, with the virus circulating at lower levels, each individual positive survey test had the potential to skew the results of the extrapolation. For example, between 22 June and 5 July the ONS carried out 26,418 tests, of which just 12 came back positive. For extrapolation purposes, each individual tested ‘represents’ about 2500 people in the wider population. So if just one of those positive results is an error, this would change the estimate for total cases from ‘about 30,000’ to ‘about 27,500’. While these numbers still indicate that the rate of infections was low at the time, the confidence interval is simply too wide for the data to be used to chart small fluctuations in case numbers. Of course, the ONS survey remains one of the best data sources we have when it comes to identifying potential localised spikes, which can then be investigated further. And as we hurtle towards a second wave, this will be crucial.
3. More testing = more positive results
Anyone watching BBC News regularly will have seen their daily graph of new confirmed cases, which hit its lowest point of under 600 per day in early July before climbing steeply throughout September to a weekly average of almost 10,000 per day in early October. It doesn’t take advanced graph-reading skills to see that the number of confirmed cases now far exceeds the highest numbers were recorded during the first peak of the pandemic in early April (around 5,500 cases per day). But — and it’s a very big but — this graph doesn’t tell us how many tests were being carried out.
While data around testing has been plagued with problems, the numbers we do have access to suggest that the UK is carrying out around 10 times as many tests now as they were in April. And we are a very long way from finding 10 times as many confirmed cases. Unfortunately the government is not regularly publishing the most meaningful piece of data here; the percentage of tests carried out which are positive. But as we try to assess how bad the current situation really is, this information is essential. A further reason to treat the raw numbers we see in the daily news with extreme caution.
4. What is a “Covid-19 death”?
Another figure the government provides daily on its dedicated Coronavirus web pages is the number of deaths. One might be forgiven for thinking this is the number of people who are dying each day from the virus. Think again. For the first five months of the pandemic, these figures were the deaths of people who have had a positive coronavirus test result at any time. So even if they had fully recovered and then died in a tragic accident entirely unrelated to the virus, they would still have appeared in these figures.
Why would the government choose to use a measure that might over-represent the number of virus deaths? Well, only counting deaths where Covid-19 was named as the cause of death on a certificate might also attract criticism — it is a complex disease which in some cases expedites death from other causes; the ‘after a positive test’ criteria is an attempt to capture this. It can also take a long time to establish a cause of death, making it impossible to provide a figure for daily deaths on a timely basis, so the ‘after a positive test’ criteria was used as a proxy to ensure expedient reporting. But the more people who tested positive due to time passing and increased testing, the more unrelated deaths ended up being counted as Covid deaths.
In mid August Public Health England came into line with the devolved UK health authorities and began publishing the number of people who had died within 28 days of a positive test. While this is a measure that can be produced quickly on a daily basis, it is far from perfect, and risks underestimating the true death toll by not counting those who die from Covid after the cutoff point, or without being tested. If all this leaves you feeling a bit baffled, you’re not alone.
5. International comparisons
Much talk has centred on comparing how different nations are coping with the pandemic; indeed it is important to know which policy approaches are succeeding in slowing the spread and reducing the death toll. Unfortunately such comparisons are riddled with problems. The start with, demographic differences between countries such as the age and density of populations makes direct comparisons a blunt tool — unless we adjust for as many of these factors as possible then the Coronavirus ‘league table’ is arguably an empty PR exercise.
Even with such adjustments, the many nuances in the way that cases and deaths are reported — such as those described above — vary from one nation to the next, potentially skewing the statistics by huge margins. In June the Spanish government was called out for claiming that there had been no reported Covid deaths in a 24 hour period, despite regional authorities reporting 17 deaths. The explanation seems to be that those deaths would be added to a weekly counted, but somehow did not qualify for the daily count due to a convenient reporting cutoff point. Anyone trying to compare the situation in Spain with other countries on that day might have come to a wholly inaccurate conclusion. And anyone trying to compile data for international comparisons both now and in the future must make countless decisions about how to treat data inconsistencies.
Excess Deaths — the magic bullet?
All of the problems highlighted above serve to illustrate the pitfalls of drawing any conclusion from standalone data. A better measure of the virus’ impact is one which compares the total deaths in any given month in 2020 with the average number of deaths for that month for recent years. Such a figure will potentially capture ‘indirect’ deaths caused by the economic and mental health effects of the pandemic and resulting lockdown, as well as Covid deaths without a positive test. These figures are available for the UK and other countries but because they are not published daily they perhaps do not attract the attention they might merit.
Where the UK’s death toll is concerned, the ONS also publishes number of deaths for which Covid-19 was mentioned on the death certificate, and for which the virus was the confirmed or suspected cause of death. It is notable that using either of these measures, the coronavirus death toll up to 30 June easily surpassed the government’s cumulative figures for ‘deaths after a positive test’ to the same date. Again, due to the delay in compiling such figures, they do not receive daily media airtime, despite being arguably the most useful measure we have of what is actually going on.