This is actually really similar to something I've been wanting to build for a lo...

debbiedowner · on Jan 6, 2024

This has been my job for over 2 years now!

We do it on a symbol level after statically analyzing each change, and everything in the monorepo daily. Our remedy to high risk changes is to run more tests, client tests not unit tests. Sometimes there are 100k client tests to pick, so we rank them and run a small subset.

It is a hard problem. One interesting observation is that there is a culprit symbol or two in the culprit change, but its connectivity is very similar to non culprits in the same change.

Another observation is that the transitively modified callgraph after a change is pretty big, a depth of 50 is not unusual. It is hard to get many useful signals out of it beyond amount of overlap in transitively affected symbols between change and test.

We found file level and build target level to be too coarse, but AST symbols are working.

20after4 · on Jan 7, 2024

Really interesting!! I wanted to implement this kind of system at Wikimedia but I quit my release engineering job at the beginning of 2022. Still think about this specific problem pretty often though. I never thought to use the score in order to determine how much testing needs to be done! That's actually really genius! If I had thought of that I probably could have pitched it and gotten more people behind the whole risk-scoring idea since overall testing times were getting really long on Wikimedia's codebase and targeted testing could have some real benefits in velocity of changes through the pipeline (with associated knock-on effects on developer productivity and job satisfaction).

kosolam · on Jan 6, 2024

100k client tests. Sounds like a lot. Is it integration tests or UI tests? How many tests overall have you got? I’m just curious

debbiedowner · on Jan 7, 2024

We add support by project, and the prototypical project we started with had 1M test reverse dependencies, a quarter of that was eligible test targets that we could recommend (based on language written in). This is probably the biggest project that we would ever find to support in the monorepo.

Some are UI tests, but we don't recommend those, because we found they don't catch breakages as often so we don't support the language they're written in. The tests we recommend are often integration type tests in that they call very higher order functions and often many of them.

withinboredom · on Jan 6, 2024

Not just the code itself, but the author. I worked with a guy that wrote at least one bug every time he created a PR.

jnwatson · on Jan 6, 2024

"If debugging is the process of removing software bugs, then programming must be the process of putting them in."

dspillett · on Jan 6, 2024

> wrote at least one bug every time he created a PR.

An economic hero. A man of the people. Creating job security for QA departments!

lpapez · on Jan 6, 2024

A friend working in office of a big-tech company located in Denmark said "one bad engineer like me working in Copenhagen can put food on the table for 20 Bulgarians working in customer support".

Since that day I always wanted to get into FAANG-type companies, writing buggy code is basically philantropy.

hypothesis · on Jan 6, 2024

This thread made some people question their no-bugs-allowed ideal, which is apparently a misanthropy…

20after4 · on Jan 7, 2024

Facebook release engineering famously kept riskiness scores for each developer and used that as a signal for whether that developer's changes got deployed or instead received extra scrutiny based on a history of broken deployments.

bostonvaulter2 · on Jan 6, 2024

I'm currently reading a book about that topic! https://pragprog.com/titles/atcrime/your-code-as-a-crime-sce...

kevindamm · on Jan 6, 2024

Code location (+) provenance/authorship (+) data flow analysis for adjacent sensitive code.

I would add it to my review tools fwiw.