So the file format is a lot better than CSV files, but in principle it's basical...

eggy · on May 10, 2016

kdb+/q/k are used for IOT applications [1], not just fin tec. After all, it is all time series data.

The benchmarks given in a response above by srpeck [2], shows spark/shark to be 230 times slower than a k4 query, and using 50GB or RAM vs. 0.2GB RAM for k4. If RiakTS is relying on spark/shark as the in-memory database engine, it is already at a big disadvantage compared to k in terms of speed, and all the RAM that is going to be required on those distributed servers.

I will have to look at the DDL/math functions available in RiakTS too, since that is how you get your work done regardless of speed of access.

[1] http://www.kdnuggets.com/2016/04/kxcon2016-kdb-conference-ma...

[2] http://kparc.com/q4/readme.txt

yummyfajitas · on May 10, 2016

Very cool, I stand corrected. I hope one day I have another opportunity to play with KDB.

As for the speed advantage, you'll have a similar speed advantage with python/pandas/big folder of CSV files. For all of Spark's claims on "speed", it's really just reducing the speed penalty of Hadoop from 500x to 50x. (Here 500x and 50x refer to the performance of loading flat files from a disk.)

gricardo99 · on May 10, 2016

Do you really mean flat CSV text files? I get the simplicity of that, but it seems really expensive (speed and size). But I'm used to tables with more than a dozen columns, and with kdb+ you only be pull in the columns of interest, and the rows of interest (due to on-disk sorting and grouping), which is a smaller subset, often much smaller.

yummyfajitas · on May 11, 2016

By number, my data sets are usually in CSV. I could probably get some additional advantage via HDF5, but a gzipped CSV is usually good enough and simpler. By volume (i.e. on my 2 or 3 biggest data sets) I'll probably be mostly HDF5. I haven't tried feather yet but it looks pretty nice.

KDB would probably be better, but don't underestimate what you can do with just a bunch of files.

macintux · on May 11, 2016

RiakTS does not rely on any external data storage (other than our fork of Google's leveldb) or processing tool, so Spark's performance is irrelevant.