Covid Update: Herd Immunity

[<< | Prev | Index | Next | >>]
Wednesday, April 22, 2020

Covid Update: Herd Immunity

Subtitle: beware smart people with bad models

Continuing my theme of debunking bad models, I want to address this equation that people keep throwing around that herd immunity happens when the number immune reaches `1-1/R_0`.

Here `R_0` is defined as the number of people who get infected, on average, by a single infected individual in a naive population.

The basis of it is the simple intuition that if one person infects `R_0` people but `I = 1-1/R_0` of those people are immune, then only `R = R_0*(1-I) = R_0*(1-[1-1/R_0]) = 1` people are actually newly infected which means the disease is no longer growing, and any reduction beyond that sees it dissipate away.

The proper application of this is to immunization through vaccination, on the assumption that immunization happens randomly, in which case it's probably about as accurate as the estimate of `R_0` itself is. The "randomly" condition there is foreshadowing the problem:

The above equation does not work, per se, for naturally acquired immunity due to natural disease spread. Lots of people have been treating it as if it does, resulting in models like this:

This is using `R_0=2.3`. (A small caveat: For simplicity, I'm treating the average transmission delay, or generation time, as a week for all of these examples, which is wrong but only impacts the horizontal scaling, which is not important here.)

What's happening here is momentum keeps carrying the infection beyond the "herd immunity" level and for subtle mathy reasons the rate of infection ends up following a roughly symmetric shape, meaning roughly as many get infected on the way down as on the way up, so that the final total number infected is roughly twice the "herd immunity" level. (It's "roughly" because as you get into the higher percentages, you have to account for two or more newly infected people being the same person.) In other words, if `R_0 >= 2` then basically everyone gets infected before it's over (under no mitigation).

If we apply this to the flu, here using `R_0=1.4` for the 2017-2018 year (eyeballed--take everything here as qualitative, not quantitative), we get:

How does that compare to the actual 2017-2018 flu season?

Hmm. Let's look at that in log space:

So, this looks pretty good until we start getting into non-trivial fractions of the population infected, at which point the real flu starts slowing down. There are multiple possible reasons for this, such as vaccinations (CDC implies in the ballpark of 10% reduction, so not nearly enough to account for this), weather, and so on.

But there is a more fundamental issue that applies when all other things are held constant: Social mixing is not remotely uniform throughout a population. Consider introverts vs. extroverts, city vs. country peeps, public transportation users vs. drivers. When we calculate `R_0` for a naive population, it is a blend of all of these, but it is dominated by the highest mixers, such as the city-dwelling, public-transportation using extroverts. And correspondingly, the initial spread is fastest through that community of high-mixers. (Hello New York.) But this means as the disease naturally spreads, `R_0` is changing (if we stick with `R = R_0 (1-I)`).

To wit, here is the implied `R_0` (inferred from implied `R`--the actual growth rate in a given time and context) for our simulation and the actual 2017-2018 flu:

It seems more reasonable to use some `R_0'` which is a modified `R_0` based on the fraction already infected (through natural spread), `I`. As a really simple example we could use `R_0' = R_0 e^-(kI)`:

(We could instead define `R` as a different function of `I` but there is some sense in keeping a modified `R_0` instead, since the herd-immunity function via randomized vaccination still works using that modified `R_0'` at any point in time! [Update: On further consideration, it would have been cleaner and more general to have modified R directly instead of indirectly via R0, but they reduce to the same end result, and this way is somewhat more intuitive visually.])

If we now re-run our flu simulation with this adaptive `R_0'` we get:

Or in linear space:

If we assume that mixing gradient is a property of the country, not the disease (in truth both would affect it), we can try re-simulating the COVID-19 spread accordingly:

Hmm, quite a difference!

Now, the take-away here isn't that this is a good model but rather to show how bad a model the original was, and how very sensitive these kinds of models are to the assumptions we put into them. Next time someone confidently declares that X number of people will be infected by the N'th generation, because `R_0` is Y and it's just simple math, duh! Well, send them here.

Which now raises the question everybody ought to want to know: What happens next with Sweden?

[Update, May 2: see Individual variation in susceptibility or exposure to SARS-CoV-2 lowers the herd immunity threshold posted today.]

[Update, July 24: see Herd immunity thresholds for SARS-CoV-2 estimated from unfolding epidemics posted today.]

[<< | Prev | Index | Next | >>]

Simon Funk / simonfunk@gmail.com