Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Clickhouse is criminally underused.

It's common knowledge that 'postgres is all you need' - but if you somehow reach the stage of 'postgres isn't all I need and I have hard proof' this should be the next tech you look at.

Also, clickhouse-local is rather amazing at csv processing using sql. Highly recommended for when you are fed up with google sheets or even excel.



This is my take too. At one of my old jobs, we were early (very early) to the Hadoop and then Spark games. Maybe too early, because by the time Spark 2 made it all easy, we had already written a lot of mapreduce-streaming and then some RDD-based code. Towards the end of my tenure there, I was experimenting with alternate datastores, and clickhouse was one I evaluated. It worked really, really well in my demos. But I couldn't get buy-in because management was a little wary of the russian side of it (which they have now distanced/divorced from, I think?) and also they didn't really have the appetite for such a large undertaking anymore. (The org was going through some things.) (So instead a different team blessed by the company owner basically DIYd a system to store .feather files on NVME SSDs... anyway).

If I were still there, I'd be pushing a lot harder to finally throw away the legacy system (which has lost so many people it's basically ossified, anyway) and just "rebase" it all onto clickhouse and pyspark sparksql. We would throw away so much shitty cruft, and a lot of the newer mapreduce and RDD code is pretty portable to the point that it could be plugged into RDD's pipe() method.

Anyway. My current job, we just stood up a new product that, from day 1, was ingesting billions of rows (event data) (~nothing for clickhouse, to be clear. but obviously way too much for pg). And it's just chugging along. Clickhouse is definitely in my toolbox right after postgres, as you state.


Agree. CH is a great technology to have some awareness of. I use it for "real things" (100B+ data points) but honestly it can really simplify little things as well.

I'd throw in one more to round it out however. The three rings of power are Postgres, ClickHouse and NATS. Postgres is the most powerful ring however and lots of times all you need.


would you recommend clickhouse over duckdb? and why?


IMO the only reason to not use ClickHouse is when you either have "small" amount of data or "small" servers (<100 Gb of data, servers with <64 Gb of RAM). Otherwise ClickHouse is a better solution since it's a standalone DB that supports replication and in general has very very robust cluster support, easily scaling to hundreds of nodes.

Typically when you discover the need for OLAP DB is when you reach that scale, so I'm personally not sure what the real use case for DuckDB is to be completely honest.


There is another place where you should not use CH, and it's in a system with shared resources. CH loves, and earned the right, to have spikes of hogging resources. They even allude to this on the Keeper setup - if you put the nodes for the two systems in the same machine, CH will inevitably push Keeper off the bed and the two will come to a disagreement. You should not have it on a k8s Pod for that reason, for example. But then again, you shouldn't have ANY storage of that capacity in a k8s pod anyways.


DuckDB probably performs better per core than clickhouse does for most queries. So as long as your workload fits on a single machine (it's likely that it does) it's often the most performant option.

Besides, it's so simple, just a single executable.

Of course if you're at a scale where you need a cluster it's not an option anymore.


The good parts of DuckDB that you've mentioned, including the fact that it is a single-executable, are modeled after ClickHouse.


Can you provide a reference for that belief? To me that's not true. They started from solving very different problems.


I didn't express myself well. What I meant to say was that Duckdb runs a single process. That simplifies things.

Clickhouse typically runs several processes (server, clients) interacting and that already makes things more complicated (and more powerful!).

That's not to say one is good and the other bad, they're just quite different tools.


Note that every use case is different and YMMV.

https://www.vantage.sh/blog/clickhouse-local-vs-duckdb


Great link . Curious how it compares now that Duckdb is 1.0+


Not to mention polars, datafusion, etc. Single node OLAP space is really heating up.


Clickhouse scales from a local tool like Duckdb to a database cluster that can back your reporting applications and other OLAP applications.


Clickhouse and Postgres are just different tools though - OLTP vs OLAP.


It’s fairly common in my experience for reports to initially be driven by a Postgres database until you hit data volumes Postgres cannot handle.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: