How often do you perform scrubs? The best practice for consumer drives is weekly...

jspiros · on Sept 1, 2014

As you can see, I do not perform scrubs as often as I should. Until I switched to SAS a year ago, I wasn't able to complete a scrub at all. The scrub you see is one of the few I've been able to complete. I need a week or two where I'm not using the filesystem that much, because the scrub really kills performance of the filesystem with the version of ZFS on Linux I'm running. I'm intending to do an upgrade to the latest version of ZoL, and then run a scrub, sometime in the next 3 months.

I haven't upgraded because, well, I haven't really seen a need to? The original reason is that this pool began under zfs-fuse, and when I switched to ZoL I kept the version at the last version supported by zfs-fuse so that I could switch back if needed. I doubt I'll ever switch back, but I do like the idea of maintaining compatibility with other ZFS implementations in case of any problems. I suppose when the OpenZFS unification stuff actually finishes, I'll be happy to upgrade to the latest version?

laumars · on Sept 1, 2014

I think this is where the benefit of using raw disks comes into play; if you develop a problem with ZoL then you can always switch to OpenIndiana or FreeBSD (I run my ZFS array from FreeBSD).

Another question, have you checked your block size is configured correctly? I hadn't even realised that mine were wrong until I'd upgraded to the newer versions of ZFS, which throw the following helpful message:

      pool: primus
     state: ONLINE
    status: One or more devices are configured to use a non-native block size.
            Expect reduced performance.
    action: Replace affected devices with devices that support the
            configured block size, or migrate data to a properly configured
            pool.
      scan: scrub repaired 0 in 16h44m with 0 errors on Tue Aug 26 17:54:48 2014
    config:
    
        NAME        STATE     READ WRITE CKSUM
        primus      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0  block size: 512B configured, 4096B native
            ada2    ONLINE       0     0     0  block size: 512B configured, 4096B native
            ada3    ONLINE       0     0     0  block size: 512B configured, 4096B native
          raidz1-1  ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
        cache
          ada1      ONLINE       0     0     0
    
    errors: No known data errors

Just in case you were curious, this is what my pool looks like:

    NAME                      USED  AVAIL  REFER  MOUNTPOINT
    primus                   7.00T   115G   287G  /primus
    primus/audio              774G   115G   774G  /primus/audio
    primus/devel             17.2M   115G  10.9M  /primus/devel
    primus/documents         6.13G   115G  6.08G  /primus/documents
    primus/downloads          275G   115G   275G  /primus/downloads
    primus/git                325M   115G   316M  /git
    primus/jails             10.7G   115G  49.3K  /jails
    primus/jails/alphatrion   212M   115G   752M  /jails/alphatrion
    primus/jails/cybertron   91.6M   115G   729M  /jails/cybertron
    primus/jails/elitaone     370M   115G   940M  /jails/elitaone
    primus/jails/galvatron    750M   115G  1.25G  /jails/galvatron
    primus/jails/megatron     937M   115G  1.31G  /jails/megatron
    primus/jails/template    2.60G   115G   691M  /jails/template
    primus/jails/unicron     5.81G   115G  5.33G  /jails/unicron
    primus/pictures          11.3G   115G  11.3G  /primus/pictures
    primus/videos            5.66T   115G  5.66T  /primus/videos
    zroot                    12.7G  94.6G  6.56G  /

(in case you weren't aware; jails are FreeBSD containers - the FreeBSD equivalent of LXC / OpenVZ)

jspiros · on Sept 1, 2014

Yeah, my oldest vdevs were configured with 512B block size, because that was the default and the ZFS community wasn't being particularly loud about ashift=12 being a good idea until later. As far as I know, there is no easy way to solve that problem? Is it possible to replace one vdev with another? Off the top of my head, my understanding is that you can replace individual disks, but replacing entire vdevs isn't possible?

If it is possible, yeah, I'll definitely replace the older vdevs entirely with new ones that have better ashift. And, while I'm at it, I'll probably switch to all 6-disk raidz2s, since that's another thing that I only learned too late, that raidz2 works best with an even number of disks in the vdev...

laumars · on Sept 1, 2014

AFAIK the block size is a pool wide setting. So you can't even mix and match block sizes, let alone incrementally upgrade it. And what's more, you can't even change the pool wide setting - so the only solution is to create a new pool and rsync your data up (which is just horrible!)

However I've only done minimal investigation into this issue so if you do find a safe way to upgrade the block size then I would love to know (I'm in a similar situation to yourself in that regard)

jspiros · on Sept 1, 2014

It's not pool-wide, it's vdev-specific. You can add new vdevs with more optimal ashift values. With my pool, half of the vdevs have sub-optimal ashift, the other half are good.

laumars · on Sept 2, 2014

That's handy to know. Thanks