Since our complexity requirements were low we settled for a home-grown solution (basically memcached with write-through) and so far didn't regret.
I, too, remain curious if anyone is running this at scale and has bumped into the various corner cases (exceeding capacity, hardware dying, etc.).