At the time, critics of the model were claiming the model was buggy because mult...

mike_hearn · on Jan 31, 2022

Your comment above is wrong. Sorry, let me try to explain again. Let's put the whole fact that random bugs != stochastic modelling to one side. I don't quite understand why this is so hard to understand but, let's shelve it for a moment.

ICL likes to claim their model is stochastic. Unfortunately that's just one of many things they said that turned out to be untrue.

The Ferguson model isn't stochastic. They claim it is because they don't understand modelling or programming. It's actually an ordinary agent-like simulation of the type you'd find in any city builder video game, and thus each time you run it you get exactly one set of outputs, not a probability distribution. They think it's "stochastic" because you can specify different PRNG seeds on the command line.

If they ran it many times with different PRNG seeds, then this would at least quantify the effect of randomness on their simulation. But, they never did. How do we know this? Several pieces of evidence:

1. There are no confidence intervals in Report 9, only different scenarios. See for yourself: https://www.imperial.ac.uk/media/imperial-college/medicine/m...

2. The program is so slow that it takes a day to do even a single run of the scenarios in Report 9. To determine CIs for something like this you'd want hundreds of runs at least. You could try and do them all in parallel on a large compute cluster, however, ICL never did that. As far as I understand their original program only ran on a single Windows box they had in their lab - it wasn't really portable and indeed its results change even in single-threaded mode between machines, due to compiler optimizations changing the output depending on whether AVX is available.

3. The "code check" document that falsely claims the model is replicable, states explicitly that "These results are the average of NR=10 runs, rather than just one simulation as used in Report 9."

So, their own collaborators confirmed that they never ran it more than once, and each run produces exactly one line on a graph. Therefore even if you accept the entirely ridiculous argument that it's OK to produce corrupted output if you take the average of multiple runs (it isn't!), they didn't do it anyway.

Finally, as one of the people who forensically examined Ferguson's work, I never accepted Guptra's either (not that this is in any way relevant). She did at least present CIs but they were so wide they boiled down to "we don't know", which seems to be a common failure mode in epidemiology - CIs are presented without being interpreted, such that you can get values like "42% (95% CI 6%-87%)" appearing in papers.

mrtedbear · on Feb 1, 2022

I took a look at point 3. and that extract from the code check is correct. Assuming they did one realisation I was curious why. It would be unlikely to be an oversight.

Luckily they published their reasoning on the number of realisations in the supplementary materials of a prior paper cited in report 9 (citation 5): https://www.nature.com/articles/nature04795#MOESM28

"Numbers of realisations & computational resources: It is essential to undertake sufficient realisation to ensure ensemble behaviour of a stochastic is well characterised for any one set of parameter values. For our past work which examined extinction probabilities, this necessitates very large numbers of model realizations being generated. In the current work, only the timing of the initial introduction of virus into a country is potentially highly variable – once case incidence reaches a few hundred cases per day, dynamics are much closer to deterministic."

So looks like they did consider the issue, and the number of realisations needed is dependent on the variable of interest in the model. The code check appears to back their justification up, "Small variations (mostly under 5%) in the numbers were observed between Report 9 and our runs."

mike_hearn · on Feb 2, 2022

The code check shows in their data tables that some variations were 10% or even 25% from the values in Report 9. These are not "small variations", nor would it matter even if they were because it is not OK to present bugs as unimportant measurement noise.

The team's claim that you only need to run it once because the variability was well characterized in the past is also nonsense. They were constantly changing the model. Even if they thought they understood the variance in the output in the past (which they didn't), it was invalidated the moment they changed the model to reflect new data and ideas.

Look, you're trying to justify this without seeming to realize that this is Hacker News. It's a site read mostly by programmers. This team demanded and got incredibly destructive policies on the back of this model, which is garbage. It's the sort of code quality that got Toyota found guilty in court of severe negligence. The fact that academics apparently struggle to understand how serious this is, is by far a faster and better creator of anti-science narratives than anything any blogger could ever write.

mrtedbear · on Feb 3, 2022

I looked at the code check. The one 25% difference is in an intermediate variable (peak beds). The two differences of 10% are 39k deaths vs 43k deaths, and 100k deaths vs 110k deaths. The other differences are less than 5%. I can see why the author of the code check would reach the conclusion he did.

I have given a possible explanation for the variation, that doesn't require buggy code, in my previous comments.

An alternative hypothesis is that it's bug driven, but very competent people (including eminent programmers like John Cormack) seem to have vouched for it on that front. I'd say this puts a high burden of proof on detractors.

https://www.nature.com/articles/d41586-020-01685-y