Really interesting!! I wanted to implement this kind of system at Wikimedia but I quit my release engineering job at the beginning of 2022. Still think about this specific problem pretty often though. I never thought to use the score in order to determine how much testing needs to be done! That's actually really genius! If I had thought of that I probably could have pitched it and gotten more people behind the whole risk-scoring idea since overall testing times were getting really long on Wikimedia's codebase and targeted testing could have some real benefits in velocity of changes through the pipeline (with associated knock-on effects on developer productivity and job satisfaction).