Github used to publish some pretty interesting postmortems. Maybe they still do. IIRC that they were struggling with scaling their SQL db and were starting to hit the limits. It's a tough position to be in because you have to either to a massive migration to a data layer with much different semantics, or you have to keep desperately squeezing performance and skirting on the edge of outages with a DB that wasn't really meant to handle what you're doing with it now.
The OpenAI blog post on "scaling" Postgres to their current scale has much the same flavor, although I think they're doing it better than Github appears to be doing.
I’d be surprised by this: GitHub pretty famously used Vitess, and I’d be surprised if each shard were too big for modern hardware. Based on previous reporting [0], they’re running out of space in the main data center and new management is determined to move to Azure in a hurry. I’d bet that these outages are a combination of a worsening capacity crunch in the old data center and…well, Azure.