Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Stack Overflow runs on 9 web servers with (iirc) 48 logical cores (2 x 12-core Xeons) and 64GB RAM. Those servers are shared by a few apps (Talent/Job, Ads, Chat, Stack Exchange/Overflow itself) but the main app uses, on average, ~5% CPU. Those machines handle roughly 5000 requests/sec and were running .NET 5 as of September 2021 (when I moved on). That’s backed by 2 v. large SQL clusters (each consisting of a primary read/write, a secondary read-only in the primary DC and a secondary in the failover DC). Most traffic to a question page directly hits SQL - cache hit ratio tends to be low so caching in Redis for those hits tends to be not useful. As somebody mentioned below, being just a single network hop away yields really low latency (~0.016ms in this case) - that certainly helps being able to scale on little hardware - typically only 10 - 20 concurrent requests would be running in any instance at any one time because the overall request end-to-end would take < 10ms to run.

Back in full framework days we had to do a fair bit of optimisation to get great performance out of .NET, but as of .NET Core 3.1 the framework _just gets out the way_ - most memory dumps and profiling subsequent to that clearly pinpoint problem areas in your own app rather than being muddied by framework shennanigans.

Source: I used to work on the Platform Engineering team at Stack Overflow :)



That's some great info, thank you!


> cache hit ratio tends to be low

That's surprising to read. Is that because of the sheer volume of question pages? I don't think I've ever been on an SO page that couldn't have been served straight from cache.


Is it? Most people come to SO from Googling their random tech problems/questions. Not sure how much value there is in caching my random Rails questions, etc


I would expect SO usage to follow a distribution like Zipfs — most visits hit a small subset of common Q/A, and there’s a ridiculously long tail of random questions getting a few visits where caching would do next to nothing. I’m fairly positive I’ve seen some post showing this was true for atleast answer-point distributions.

Though I guess it’s possible for a power distribution for page-likely-to-be-hit to still be useless for caching, because I think you could still get that distribution if 99% of hits are on nearly-unique pages; with a long enough tail, you’d still have only relatively few pages worth bothering to cache, but by far most visits are in the tail




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: