I would also expect the estimated magnitude of the effect to go down over time, but that's just my general attitude to these kinds of things, the fact is that the discontinuity design that they use already accounts for variations between classes, teachers, schools, years. The way it works is that some unexpected event that applies to some people but not others is taken to represent a natural experiment, and then variation between groups before the event is compared to variation between groups after the event. The comparison is never against no variation.
The smoking gun is really in Table 3 and Table 4, where you can see that the effects that were observed are compatible with a population effect of 0, or alternatively you can look at Figure 2 and note that you could draw a straight line (no effect) within the confidence bands. Doesn't mean the effect is not there, but that there's insufficient evidence that it is, and that we should indeed be very careful about taking the estimates at face value.
The smoking gun is really in Table 3 and Table 4, where you can see that the effects that were observed are compatible with a population effect of 0, or alternatively you can look at Figure 2 and note that you could draw a straight line (no effect) within the confidence bands. Doesn't mean the effect is not there, but that there's insufficient evidence that it is, and that we should indeed be very careful about taking the estimates at face value.