ZFS isn’t viable for SQLite unless you turn off fsync’s in ZFS, because otherwise you will have the same experience I had for years; SQLite may randomly hang for up to a few minutes with no visible cause, if there isn’t sufficient write txg’s to fill up in the background. If your app depends on SQLite, it’ll randomly die.
Btrfs is a better choice for sqlite, haven’t seen that issue there.
The latest comment seems to be a nice summary of the root cause, with earlier in the thread pointing to ftruncate instead of fsync being a trigger:
>amotin
>I see. So ZFS tries to drop some data from pagecache, but there seems to be some dirty pages, which are held by ZFS till them either written into ZIL, or to disk at the end of TXG. And if those dirty page writes were asynchronous, it seems there is nothing that would nudge ZFS to actually do something about it earlier than zfs_txg_timeout. Somewhat similar problem was recently spotted on FreeBSD after #17445, which is why newer version of the code in #17533 does not keep references on asynchronously written pages.
You know what's even easier than doing that? Neglecting to do it or meaning to do it then getting pulled in to some meeting (or other important distraction) and then imagining you did it.
> Neglecting to do it or meaning to do it then getting pulled in to some meeting (or other important distraction) and then imagining you did it.
If your job is to make sure your file system and your database—SQLite, Pg, My/MariDB, etc—are tuned together, and you don't tune it, then you should be called into a meeting. Or at least the no-fault RCA should bring up remediation methods to make sure it's part of the SOP so that it won't happen again.
The alternative the GP suggests is using Btrfs, which I find even more irresponsible than your non-tuning situation. (Heck, if someone on my sysadmin team suggested we start using Btrfs for anything I would think they were going senile.)
In any interesting tech position, "your job" is a myriad of things. And if everything is a priority, nothing is a priority. They ask us to automate our work so that "someone else on the team could do it during an outage at 3 am" for a reason. So my point is that moving away from the exotic and towards the commodity is an IT imperative.
Facebook is apparently using it at scale, which surprised me. Though that’s not necessarily an endorsement, and who knows what their kernel patcheset looks like.
This isn't an inherent property of ZFS at all. I have made heavy use of SQLite for years (on illumos systems) without ever hitting this, and I would never counsel anybody to disable sync writes: it absolutely can lead to data loss under some conditions and is not safe to do unless you understand what it means.
What you're describing sounds like a bug specific to whichever OS you're using that has a port of ZFS.
Btrfs is a better choice for sqlite, haven’t seen that issue there.