Prior Knowledge (http://www.priorknowledge.com) has a very similar API but a more interesting underlying model. They model the full joint distribution of the data, so any variables can be missing not just the outcome. They also are able to return the joint probability distribution over unknowns, which is extremely useful in terms of quantifying uncertainty.
I'm very curious to see (a) what sort of generative model they're using under the hood, and (b) how they do inference efficiently enough to not dedicate a cluster to each customer.
The model itself appears very flexible:
http://blog.priorknowledge.com/blog/beyond-correlation/
I'm not affiliated with these guys but they are clearly doing the most interesting work in this area.