It's common knowledge that 'postgres is all you need' - but if you somehow reach the stage of 'postgres isn't all I need and I have hard proof' this should be the next tech you look at.
Also, clickhouse-local is rather amazing at csv processing using sql. Highly recommended for when you are fed up with google sheets or even excel.
This is my take too. At one of my old jobs, we were early (very early) to the Hadoop and then Spark games. Maybe too early, because by the time Spark 2 made it all easy, we had already written a lot of mapreduce-streaming and then some RDD-based code. Towards the end of my tenure there, I was experimenting with alternate datastores, and clickhouse was one I evaluated. It worked really, really well in my demos. But I couldn't get buy-in because management was a little wary of the russian side of it (which they have now distanced/divorced from, I think?) and also they didn't really have the appetite for such a large undertaking anymore. (The org was going through some things.) (So instead a different team blessed by the company owner basically DIYd a system to store .feather files on NVME SSDs... anyway).
If I were still there, I'd be pushing a lot harder to finally throw away the legacy system (which has lost so many people it's basically ossified, anyway) and just "rebase" it all onto clickhouse and pyspark sparksql. We would throw away so much shitty cruft, and a lot of the newer mapreduce and RDD code is pretty portable to the point that it could be plugged into RDD's pipe() method.
Anyway. My current job, we just stood up a new product that, from day 1, was ingesting billions of rows (event data) (~nothing for clickhouse, to be clear. but obviously way too much for pg). And it's just chugging along. Clickhouse is definitely in my toolbox right after postgres, as you state.
Agree. CH is a great technology to have some awareness of. I use it for "real things" (100B+ data points) but honestly it can really simplify little things as well.
I'd throw in one more to round it out however. The three rings of power are Postgres, ClickHouse and NATS. Postgres is the most powerful ring however and lots of times all you need.
IMO the only reason to not use ClickHouse is when you either have "small" amount of data or "small" servers (<100 Gb of data, servers with <64 Gb of RAM). Otherwise ClickHouse is a better solution since it's a standalone DB that supports replication and in general has very very robust cluster support, easily scaling to hundreds of nodes.
Typically when you discover the need for OLAP DB is when you reach that scale, so I'm personally not sure what the real use case for DuckDB is to be completely honest.
There is another place where you should not use CH, and it's in a system with shared resources. CH loves, and earned the right, to have spikes of hogging resources. They even allude to this on the Keeper setup - if you put the nodes for the two systems in the same machine, CH will inevitably push Keeper off the bed and the two will come to a disagreement. You should not have it on a k8s Pod for that reason, for example. But then again, you shouldn't have ANY storage of that capacity in a k8s pod anyways.
DuckDB probably performs better per core than clickhouse does for most queries. So as long as your workload fits on a single machine (it's likely that it does) it's often the most performant option.
Besides, it's so simple, just a single executable.
Of course if you're at a scale where you need a cluster it's not an option anymore.
It's common knowledge that 'postgres is all you need' - but if you somehow reach the stage of 'postgres isn't all I need and I have hard proof' this should be the next tech you look at.
Also, clickhouse-local is rather amazing at csv processing using sql. Highly recommended for when you are fed up with google sheets or even excel.