Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is actually really similar to something I've been wanting to build for a long time. In my case I've thought it would be useful to have a way to calculate the likelihood for a given change to break things based on the history of breaking changes in the same file or area of the file. Basically a riskiness score for each change. The risk score could be associated with each PR and would provide a signal for reviewers about which code should get a bit of extra attention as well as highlighting the risky changes when they are being deployed.

The tricky part would be tracking the same part of the code as it moves up and down because of insertions/deletions above it which would cause problems for a naive algorithm based on line numbers.

Just doing it at the file level, like this does, might be good enough to be useful though.



This has been my job for over 2 years now!

We do it on a symbol level after statically analyzing each change, and everything in the monorepo daily. Our remedy to high risk changes is to run more tests, client tests not unit tests. Sometimes there are 100k client tests to pick, so we rank them and run a small subset.

It is a hard problem. One interesting observation is that there is a culprit symbol or two in the culprit change, but its connectivity is very similar to non culprits in the same change.

Another observation is that the transitively modified callgraph after a change is pretty big, a depth of 50 is not unusual. It is hard to get many useful signals out of it beyond amount of overlap in transitively affected symbols between change and test.

We found file level and build target level to be too coarse, but AST symbols are working.


Really interesting!! I wanted to implement this kind of system at Wikimedia but I quit my release engineering job at the beginning of 2022. Still think about this specific problem pretty often though. I never thought to use the score in order to determine how much testing needs to be done! That's actually really genius! If I had thought of that I probably could have pitched it and gotten more people behind the whole risk-scoring idea since overall testing times were getting really long on Wikimedia's codebase and targeted testing could have some real benefits in velocity of changes through the pipeline (with associated knock-on effects on developer productivity and job satisfaction).


100k client tests. Sounds like a lot. Is it integration tests or UI tests? How many tests overall have you got? I’m just curious


We add support by project, and the prototypical project we started with had 1M test reverse dependencies, a quarter of that was eligible test targets that we could recommend (based on language written in). This is probably the biggest project that we would ever find to support in the monorepo.

Some are UI tests, but we don't recommend those, because we found they don't catch breakages as often so we don't support the language they're written in. The tests we recommend are often integration type tests in that they call very higher order functions and often many of them.


Not just the code itself, but the author. I worked with a guy that wrote at least one bug every time he created a PR.


"If debugging is the process of removing software bugs, then programming must be the process of putting them in."


> wrote at least one bug every time he created a PR.

An economic hero. A man of the people. Creating job security for QA departments!


A friend working in office of a big-tech company located in Denmark said "one bad engineer like me working in Copenhagen can put food on the table for 20 Bulgarians working in customer support".

Since that day I always wanted to get into FAANG-type companies, writing buggy code is basically philantropy.


This thread made some people question their no-bugs-allowed ideal, which is apparently a misanthropy…


Facebook release engineering famously kept riskiness scores for each developer and used that as a signal for whether that developer's changes got deployed or instead received extra scrutiny based on a history of broken deployments.


I'm currently reading a book about that topic! https://pragprog.com/titles/atcrime/your-code-as-a-crime-sce...


Code location (+) provenance/authorship (+) data flow analysis for adjacent sensitive code.

I would add it to my review tools fwiw.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: