I'm also very curious to hear real world feedback on this, I've been aware of the option for a while, but never heard much about actual use.
Automatic sharding and memcached integration are pretty awesome features, and could definitely ease code at the application level (sharding code is a particularly special pain in the ass, not so much getting it working, but allowing for re-sharding migrations if you decide you need more shards, especially trying to do so without downtime which involves all sorts of nasty tradeoffs).
But bang-for-the-buck and reliability wise, I'm still unsure, I've heard very little about this in the wild. Is this more suited for high write v. read ratio situations, or is it aiming more at the Vertica/Greenplum big-data uses, or something else?
It is more suited for realtime transaction processing type activity, not for big data type analytics.
NDB was originally created by and for telecoms. So very high write/read rates with very fast response times and very high availability required. Generally not extremely large datasets.
If you know what you are doing, NDB can work extremely well. It supports all of the highend cluster goodies including online software upgrade, online node addition, automatic handling of node failures, geographic async replication, etc...
However, it certainly has a pretty steep learning curve from the admin point of view and it is a bit easy to mess things up. It is a bit brittle due to this, but once it is setup properly and running, it can deliver on the promises.
We've been happily using it for the better part of a year now. Our data set fits easily in memory and the limitations of the product aren't a problem at all. We started with riak but the cognitive overhead on the developer side was a bit much for the operations side win, mysql cluster has been trouble free so far.
MySQL is an expensive toy if you spring for Enterprise. Crashes all over the place, bugs stay open forever, no one really knows how to fix it, people on IRC are mean and think they know more than they do, etc.
Automatic sharding and memcached integration are pretty awesome features, and could definitely ease code at the application level (sharding code is a particularly special pain in the ass, not so much getting it working, but allowing for re-sharding migrations if you decide you need more shards, especially trying to do so without downtime which involves all sorts of nasty tradeoffs).
But bang-for-the-buck and reliability wise, I'm still unsure, I've heard very little about this in the wild. Is this more suited for high write v. read ratio situations, or is it aiming more at the Vertica/Greenplum big-data uses, or something else?