Global Editions

The Treachery of Models

Statistical models should be used only as tools in aid of policy making. Their projections about the impact of covid-19 should be complemented with socio-cultural and economic risk factors associated with complete lockdowns
by Hammad Khan

On March 16, the Imperial College London COVID-19 Response Team published a paper, forecasting an alarming mortality rate from the novel coronavirus. The study estimated that the number of deaths in the United States alone may rise to 2.2 million. The research has since influenced the policy response to the pandemic across the globe.

However, as more public health data started becoming available, different research teams came up with varying projections of the virus’ spread and deaths caused by it. Forecasting is a tricky affair, since data models can never account for all possible factors that go into the outcome they seek to predict. This is particularly important in social and economic domains where researchers are confronted with non-quantifiable aspects of the political or cultural environment. Understanding such shortcomings is vital to the effective use of projections in formulating public policy.

Garbage in, Garbage out

Let’s take the example of the Imperial College study. It did a great job in illustrating the danger posed by the exponential nature of the contagion, but it has been criticized for structural shortcomings of the data model used by the team. A review published by the New England Complex Systems Institute (NECSI) highlights its lack of attention to the impact of contact-tracing that may allow isolation of infected individuals prior to the onset of symptoms. Another limitation identified by the NECSI is the failure of the study to account for door-to-door monitoring campaigns for identifying symptomatic cases.

A careful examination of the literature produced since the Imperial College study shows that four variables are key for any data model meant to project the covid-19 impact: the number of persons likely to be infected by a contagious person (also known as the Replication Parameter, or R0); the number of infected persons; the number of persons hospitalized; and the number of fatalities. Given the sensitivity of statistical models, it is important to bear in mind that even a slight change in any of these inputs is going to result in a massive variance in predictions. In other words, if we put in garbage in our model, it will give out garbage too.

Take the number of infected persons. The Imperial College study suggested that  between 40 and 50 percent of the infections remained unidentified. These included asymptomatic infections as well as cases of mild disease. The number of unidentified cases has subsequently been shown to be as high as 86 percent. The implications of underreporting are profound. Firstly, fatality rates are likely to be far lower than expected. Secondly, the high number of unidentified cases across the globe suggests that considerable immunity may in fact have already been built up among populations. Finally, the impact of social distancing may perhaps be severely overestimated since the virus has been prevalent in our communities for a while now.

Similarly, the average number of persons likely to be infected by a contagious person has been inferred in most models using early data from Wuhan province of China. This number was estimated to be between 2.0 and 2.6 in the Imperial College study, but it has since been revised to around 3.0. This seemingly small revision has profound implications for our projections. It suggests that the virus has infected significantly more people than previously assumed, and that we may be further along than expected in developing herd immunity. Additionally, as a significant percentage of the population may already have the virus, social distancing measures may not be as effective as expected.

In short, statistical models are only as good as their input assumptions, and this insight gains exponential significance in the case of models making projections about events with exponential growth potential. The smallest of changes in input data can lead to massive changes in outcomes predicted by the model. In the case of the covid-19 pandemic, this volatility is becoming apparent as more data become available. For example, by April 7, the University of Washington’s Institute of Health Metrics and Evaluation (IHME) model reported significant downward adjustment in forecasted fatalities in several U.S. states. The projected number of deaths for North Carolina came out 80% lower than the earlier figure, for Pennsylvania 75%, for California 70%, for Texas 65% and Washington 55%. The projections for Loiusiana were nearly two-thirds less than the earlier figures. Other major models, such as those from Northeastern University and Columbia University, also saw swings in their predictions with more data becoming available in April.

An Unfortunate Utilitarian Calculus

Another implicit trap of models discussing the impact of covid-19 is their framing. The problem is presented only in terms of the number of infections and fatalities, and the overwhelming impact of the pandemic on healthcare systems. While these are central concerns, a singular focus on healthcare systems fails to recognize global disparities among countries.

While some economies may have the ability to inject trillions in relief, those marked by high inflation, shallow and volatile finance markets, and a significant number of population living at subsistence levels do not enjoy this privilege. Even in an advanced economy like the U.S., the economic consequences of a shutdown already threaten to rival the Great Depression. Additionally, there are adverse social effects of lockdowns that are expected to escalate instances of domestic and child abuse by confining victims in close proximity to their abusers, and severing protective lifelines and community connections.

These consequences will be amplified in more fragile economies like Pakistan where across-the-board lockdowns will place untold pressure on populations living at subsistence levels with limited access to food stores, clean water, and health-care facilities. The simplistic metrics of statistical modelling composed just of fatalities from the virus and an overwhelmed medical system do not capture such socio-cultural and economic risk factors. Thus, policy recommendations must be formulated after taking into account a complex mix of factors, including not only public health concerns but also the peculiar economic and cultural context of a country, meaning that there should be a recognition of difficult tradeoffs.

Believe absurdities, commit atrocities

Voltaire famously said, “Those who can make you believe absurdities, can make you commit atrocities.” The simplistic predictions of models, when millions of lives and trillions of dollars hang in the balance, must be handled with care. Blindly implementing policy recommendations from epidemiologists is an abdication of leadership. No set of models can capture the complexity of the crisis. Balancing the multiple costs and charting a path through this crisis is a generational challenge. There are no playbooks, and new responses and strategies ought to be developed in view of the evolving context. This means that policy makers should take projections provided by statistical models as tools, rather than as infallible pronouncements of a prophet.

Hammad khan is a technology executive focused on rapid development of analytical systems. He currently resides in Portland, USA, where he is involved with technology initiatives centered on predictive and prescriptive analytics.

اسے اردو میں پڑھیں