At the time, critics of the model were claiming the model was buggy because multiple runs would produce different results. My comment above explains why that is not evidence for the model being buggy.
Report 9 talks about parameters being modeled as probability distributions, i.e. its a stochastic model. I doubt they would draw conclusions from a single run, as the code is drawing a single sample from a probability distribution. And, if you look at the paper describing the original model (cited in report 9), they do test the model with multiple runs. On top of that they perform sensitivity analyses to check erroneous assumptions aren't driving the model.
I have spent time in academia, but I'm not an academic, and don't feel any obligation to fly the flag for academia.
Regarding the politics, contrast how the people who forensically examined Ferguson's papers were so ready to accept the competing (and clearly incorrect https://www.youtube.com/watch?v=DKh6kJ-RSMI) results from Sunetra Gupta's group.
Fair point about academic code being messy. It's a big issue, but the incentives are not there at the moment to write quality code. I assume you're a programmer - if you wanted to be the change you want to see, you could join an academic group, reduce your salary by 3x-4x, and be in a place where what you do is not a priority.
Your comment above is wrong. Sorry, let me try to explain again. Let's put the whole fact that random bugs != stochastic modelling to one side. I don't quite understand why this is so hard to understand but, let's shelve it for a moment.
ICL likes to claim their model is stochastic. Unfortunately that's just one of many things they said that turned out to be untrue.
The Ferguson model isn't stochastic. They claim it is because they don't understand modelling or programming. It's actually an ordinary agent-like simulation of the type you'd find in any city builder video game, and thus each time you run it you get exactly one set of outputs, not a probability distribution. They think it's "stochastic" because you can specify different PRNG seeds on the command line.
If they ran it many times with different PRNG seeds, then this would at least quantify the effect of randomness on their simulation. But, they never did. How do we know this? Several pieces of evidence:
2. The program is so slow that it takes a day to do even a single run of the scenarios in Report 9. To determine CIs for something like this you'd want hundreds of runs at least. You could try and do them all in parallel on a large compute cluster, however, ICL never did that. As far as I understand their original program only ran on a single Windows box they had in their lab - it wasn't really portable and indeed its results change even in single-threaded mode between machines, due to compiler optimizations changing the output depending on whether AVX is available.
3. The "code check" document that falsely claims the model is replicable, states explicitly that "These results are the average of NR=10 runs, rather than just one simulation as used in Report 9."
So, their own collaborators confirmed that they never ran it more than once, and each run produces exactly one line on a graph. Therefore even if you accept the entirely ridiculous argument that it's OK to produce corrupted output if you take the average of multiple runs (it isn't!), they didn't do it anyway.
Finally, as one of the people who forensically examined Ferguson's work, I never accepted Guptra's either (not that this is in any way relevant). She did at least present CIs but they were so wide they boiled down to "we don't know", which seems to be a common failure mode in epidemiology - CIs are presented without being interpreted, such that you can get values like "42% (95% CI 6%-87%)" appearing in papers.
I took a look at point 3. and that extract from the code check is correct. Assuming they did one realisation I was curious why. It would be unlikely to be an oversight.
"Numbers of realisations & computational resources:
It is essential to undertake sufficient realisation to ensure ensemble behaviour of a stochastic is
well characterised for any one set of parameter values. For our past work which examined
extinction probabilities, this necessitates very large numbers of model realizations being
generated. In the current work, only the timing of the initial introduction of virus into a country is
potentially highly variable – once case incidence reaches a few hundred cases per day, dynamics
are much closer to deterministic."
So looks like they did consider the issue, and the number of realisations needed is dependent on the variable of interest in the model. The code check appears to back their justification up,
"Small variations (mostly under 5%) in the numbers were observed between Report 9 and our runs."
The code check shows in their data tables that some variations were 10% or even 25% from the values in Report 9. These are not "small variations", nor would it matter even if they were because it is not OK to present bugs as unimportant measurement noise.
The team's claim that you only need to run it once because the variability was well characterized in the past is also nonsense. They were constantly changing the model. Even if they thought they understood the variance in the output in the past (which they didn't), it was invalidated the moment they changed the model to reflect new data and ideas.
Look, you're trying to justify this without seeming to realize that this is Hacker News. It's a site read mostly by programmers. This team demanded and got incredibly destructive policies on the back of this model, which is garbage. It's the sort of code quality that got Toyota found guilty in court of severe negligence. The fact that academics apparently struggle to understand how serious this is, is by far a faster and better creator of anti-science narratives than anything any blogger could ever write.
I looked at the code check. The one 25% difference is in an intermediate variable (peak beds). The two differences of 10% are 39k deaths vs 43k deaths, and 100k deaths vs 110k deaths. The other differences are less than 5%. I can see why the author of the code check would reach the conclusion he did.
I have given a possible explanation for the variation, that doesn't require buggy code, in my previous comments.
An alternative hypothesis is that it's bug driven, but very competent people (including eminent programmers like John Cormack) seem to have vouched for it on that front. I'd say this puts a high burden of proof on detractors.
Report 9 talks about parameters being modeled as probability distributions, i.e. its a stochastic model. I doubt they would draw conclusions from a single run, as the code is drawing a single sample from a probability distribution. And, if you look at the paper describing the original model (cited in report 9), they do test the model with multiple runs. On top of that they perform sensitivity analyses to check erroneous assumptions aren't driving the model.
I have spent time in academia, but I'm not an academic, and don't feel any obligation to fly the flag for academia.
Regarding the politics, contrast how the people who forensically examined Ferguson's papers were so ready to accept the competing (and clearly incorrect https://www.youtube.com/watch?v=DKh6kJ-RSMI) results from Sunetra Gupta's group.
Fair point about academic code being messy. It's a big issue, but the incentives are not there at the moment to write quality code. I assume you're a programmer - if you wanted to be the change you want to see, you could join an academic group, reduce your salary by 3x-4x, and be in a place where what you do is not a priority.