ZFS isn’t viable for SQLite unless you turn off fsync’s in ZFS, because otherwis...

Modified3019 · 2025-07-24T22:11:26 1753395086

Interesting. Found a GitHub issue that covers this bug: https://github.com/openzfs/zfs/issues/14290

The latest comment seems to be a nice summary of the root cause, with earlier in the thread pointing to ftruncate instead of fsync being a trigger:

>amotin

>I see. So ZFS tries to drop some data from pagecache, but there seems to be some dirty pages, which are held by ZFS till them either written into ZIL, or to disk at the end of TXG. And if those dirty page writes were asynchronous, it seems there is nothing that would nudge ZFS to actually do something about it earlier than zfs_txg_timeout. Somewhat similar problem was recently spotted on FreeBSD after #17445, which is why newer version of the code in #17533 does not keep references on asynchronously written pages.

Might be worth testing zfs_txg_timeout=1 or 0

throw0101b · 2025-07-24T19:24:38 1753385078

> ZFS isn’t viable for SQLite unless you turn off fsync’s in ZFS

Which you can do on a per dataset ('directory') basis very easily:

    zfs set sync=disabled mydata/mydb001

* https://openzfs.github.io/openzfs-docs/man/master/7/zfsprops...

Meanwhile all the rest of your pools / datasets can keep the default POSIX behaviour.

ezekiel68 · 2025-07-24T19:40:49 1753386049

You know what's even easier than doing that? Neglecting to do it or meaning to do it then getting pulled in to some meeting (or other important distraction) and then imagining you did it.

throw0101b · 2025-07-24T19:52:19 1753386739

> Neglecting to do it or meaning to do it then getting pulled in to some meeting (or other important distraction) and then imagining you did it.

If your job is to make sure your file system and your database—SQLite, Pg, My/MariDB, etc—are tuned together, and you don't tune it, then you should be called into a meeting. Or at least the no-fault RCA should bring up remediation methods to make sure it's part of the SOP so that it won't happen again.

The alternative the GP suggests is using Btrfs, which I find even more irresponsible than your non-tuning situation. (Heck, if someone on my sysadmin team suggested we start using Btrfs for anything I would think they were going senile.)

ezekiel68 · 2025-07-26T17:54:30 1753552470

In any interesting tech position, "your job" is a myriad of things. And if everything is a priority, nothing is a priority. They ask us to automate our work so that "someone else on the team could do it during an outage at 3 am" for a reason. So my point is that moving away from the exotic and towards the commodity is an IT imperative.

johncolanduoni · 2025-07-24T21:09:33 1753391373

Facebook is apparently using it at scale, which surprised me. Though that’s not necessarily an endorsement, and who knows what their kernel patcheset looks like.

zaarn · 2025-07-25T06:29:42 1753424982

Disabling sync corrupts SQLite databases on powerloss, I've personally experienced this following disabling sync because it causes SQLite to hang.

You cannot have SQLite keep your data and run well on ZFS unless you make a zvol and format it as btrfs or ext4 so they solve the problem for you.

kentonv · 2025-07-24T20:43:26 1753389806

Doesn't turning off sync mean you can lose confirmed writes in a power failure?

jclulow · 2025-07-25T07:43:21 1753429401

This isn't an inherent property of ZFS at all. I have made heavy use of SQLite for years (on illumos systems) without ever hitting this, and I would never counsel anybody to disable sync writes: it absolutely can lead to data loss under some conditions and is not safe to do unless you understand what it means.

What you're describing sounds like a bug specific to whichever OS you're using that has a port of ZFS.

zaarn · 2025-07-25T09:27:35 1753435655

I wouldn't recommend SQLite on ZFS (or in general for other reasons), for the precise reason that it either lags or is unsafe.

I've encountered this bug both on illumos, specifically OpenIndiana, and Linux (Arch Linux).