Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Check this out: http://blinkdb.org/

A new research collaboration between UC Berkeley and MIT on probabilistic query answering. It allows users to specify trade-off between error confidence bound vs time.



"BlinkDB can execute a range of queries over a real-world query trace on up to 17 TB of data and 100 nodes in 2 seconds, with an error of 2–10%"

17 TB on 100 nodes? That a lot of nodes to hold 17 TB; on average 174 GB-ish of data each.

The speed is super impressive, but using 100 nodes makes this look more like a parallel processing achievement than "big data".


Even with parallel processing, assuming you can handle 1G of data per node per second (which is a fairly optimistic estimation for on disk data):

1G/node/sec * 100 nodes * 2 sec = 200G.

17TB is 85 times that number.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: