Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

SQLite is great for datasets that fit comfortably into memory, but otherwise it starts to struggle.


Come again? It is a database. The entire raison-d'etre is for managing bigger-than-memory data. On the SQLite forum[0], Dr. Hipp indicated a customer approached them that they were using a ~40 TB SQLite database. Now, you could argue it is less good at that scale, particularly analytical workloads, but SQLite has an enormous amount of headroom that you are unlikely to encounter.

[0]: https://sqlite.org/forum/info/6e13e1911f1fe39c


That's not accurate: SQLite works great with data that's too big to fit in memory provided you create indexes and write queries that can take advantage of them - just like any other relational database (MySQL/PostgreSQL etc).


I guess it depends on what you’re trying to do. Creating those indices is going to require a scan through the whole data set, and you’re going to need to run a fair number of queries to amortize that. And if I’m going to be querying this data over and over again, that raises the question of why it wasn’t in a relational database to begin with.

In my personal experience I’m usually digging through structured logs to answer one or two questions, after which point I won’t need the exact same data set to be indexed the exact same way again. That’s often more easily done by converting the data to TSV and using awk and other command line tools, which is typically quicker and more parallelizable than loading the whole works into SQLite and doing the work there.


happily, i have multiple gigabytes of memory …


Sure, but a 40 GB SQLite database on a machine with 16 GB of RAM is not gonna be happy


You're not going to do better with pandas or similar tools. If it can't fit in memory, it's going to be painful. SQLite is the least painful in my experience, and it sets you up for working with the data in a proper DB like postgres or similar for when you get fed up with the memory constraints.


I wouldn’t use pandas in that situation either.


In the original article, they were trying to read a 10GB file into 30GB RAM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: