-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Rocky Linux |
| Distribution Version | 9.6 |
| Kernel Version | 5.14.0-570.42.2.el9_6.x86_64 |
| Architecture | x86_64 |
| OpenZFS Version | 2.2.8 |
Describe the problem you're observing
Yesterday noticed that the monthly scrub was running and we briefly saw:
errors: Permanent errors have been detected in the following files:
But no files listed. Shortly afterward the message disappeared from the status output.
This system is a Pacemaker managed HA system and this morning I had need to fail-over to the other node whilst I installed some OS updates (forgetting that the scrub was running). Because the scrub was running the failover timed out resulting in a multi-path SCSI fence event - ordinarily this doesn't cause a permanent problem however this time the fail-over host (now running RL 9.7 and ZFS 2.2.9) failed to mount the file systems with the error:
cannot mount '': Invalid exchange
zpool status showed the scrub continuing with the same 'empty' permanent errors message and four drives each out of two Z3 vdevs showing checksum errors. Each time the mount command is run the checksum errors on these drives increment by two.
# zpool status -v
pool: <poolname>
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub in progress since Wed Dec 3 09:41:59 2025
40.9T / 61.4T scanned at 8.17G/s, 32.3T / 61.4T issued at 6.45G/s
0B repaired, 52.59% done, 01:16:56 to go
config:
NAME STATE READ WRITE CKSUM
xnat ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
35000cca40f3bf248 ONLINE 0 0 34
35000cca40f345f40 ONLINE 0 0 34
35000cca40f355ef8 ONLINE 0 0 34
35000cca40f2ff80c ONLINE 0 0 34
35000cca40f346aa8 ONLINE 0 0 0
35000cca40f3c4c88 ONLINE 0 0 0
35000cca40f3a9a28 ONLINE 0 0 0
35000cca40f380e00 ONLINE 0 0 0
35000cca40f3c5730 ONLINE 0 0 0
35000cca40f3ac508 ONLINE 0 0 0
35000cca40f3a4750 ONLINE 0 0 0
raidz3-1 ONLINE 0 0 0
35000cca40f3c5a88 ONLINE 0 0 0
35000cca40f3a55c0 ONLINE 0 0 0
35000cca40f3964b4 ONLINE 0 0 0
35000cca40f3a547c ONLINE 0 0 0
35000cca40f3a9744 ONLINE 0 0 0
35000cca40f3c5558 ONLINE 0 0 0
35000cca40f3a52e4 ONLINE 0 0 0
35000cca40f37fc78 ONLINE 0 0 0
35000cca40f3c4e98 ONLINE 0 0 0
35000cca40f3c4cbc ONLINE 0 0 0
35000cca40f3a4f40 ONLINE 0 0 0
raidz3-2 ONLINE 0 0 0
35000cca40f3c5bf8 ONLINE 0 0 34
35000cca40f377e04 ONLINE 0 0 0
35000cca40f3aa0d0 ONLINE 0 0 0
35000cca40f3c57b8 ONLINE 0 0 0
35000cca40f2e323c ONLINE 0 0 0
35000cca40f3936e0 ONLINE 0 0 0
35000cca40f3a44f0 ONLINE 0 0 0
35000cca40f3a465c ONLINE 0 0 0
35000cca40f3b2c98 ONLINE 0 0 34
35000cca40f38ccc4 ONLINE 0 0 34
35000cca40f3b6954 ONLINE 0 0 34
raidz3-3 ONLINE 0 0 0
35000cca40f392c24 ONLINE 0 0 0
35000cca40f3a4574 ONLINE 0 0 0
35000cca40f377b3c ONLINE 0 0 0
35000cca40f377b08 ONLINE 0 0 0
35000cca40f3a979c ONLINE 0 0 0
35000cca40f2ff97c ONLINE 0 0 0
35000cca40f38cd4c ONLINE 0 0 0
35000cca40f3b2d94 ONLINE 0 0 0
35000cca40f38cdfc ONLINE 0 0 0
35000cca40f3c4df8 ONLINE 0 0 0
35000cca40f3c5aec ONLINE 0 0 0
raidz3-4 ONLINE 0 0 0
35000cca40f3a3de4 ONLINE 0 0 0
35000cca40f3a4340 ONLINE 0 0 0
35000cca40f3c55f4 ONLINE 0 0 0
35000cca40f3086ac ONLINE 0 0 0
35000cca40f39d64c ONLINE 0 0 0
35000cca40f377b68 ONLINE 0 0 0
35000cca40f3c5624 ONLINE 0 0 0
35000cca40f38cfd8 ONLINE 0 0 0
35000cca40f3a9cc8 ONLINE 0 0 0
35000cca40f38de68 ONLINE 0 0 0
35000cca40f3adcb0 ONLINE 0 0 0
logs
mirror-5 ONLINE 0 0 0
358ce38ee22ebfe59 ONLINE 0 0 0
358ce38ee22ec0131 ONLINE 0 0 0
spares
35000cca40f3aa560 AVAIL
35000cca40f3b777c AVAIL
35000cca40f3477c0 AVAIL
errors: Permanent errors have been detected in the following files:
On further investigation, the checksum errors were first reported by zed on the 1st of December when the monthly scrub started. This predates any failover event, so the failover isn't significant here.
After the scrub completed zfs mount -a still reported the same error with an increment of 2 to each checksum error. Reboot did not help.
We have discovered that it is the root of the pool that won't mount (which itself isn't encrypted) - child datasets do mount, e.g.
pool/shares/folder1
where "shares" is the encryption root and "folder1" is a child dataset of that, both "shares" and "shares/folder1" mount fine, just not "pool".
Describe how to reproduce the problem
Create natively encrypted folders under a pool, wait?
Include any warning/errors/backtraces from the system logs
dmesg doesn't show any SCSI timeouts or errors
After failed mount /var/log/messages shows:
Dec 3 11:31:32 nfs1 zed[245378]: eid=79 class=data pool='mypool' priority=0 err=52 flags=0x808081 bookmark=54:0:0:0
Dec 3 11:31:32 nfs1 zed[245383]: eid=80 class=checksum pool='mypool' vdev=35000cca40f3c5bf8 size=4096 offset=229585539072 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec 3 11:31:32 nfs1 zed[245385]: eid=81 class=checksum pool='mypool' vdev=35000cca40f3b6954 size=4096 offset=229585534976 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec 3 11:31:32 nfs1 zed[245393]: eid=83 class=checksum pool='mypool' vdev=35000cca40f3b2c98 size=4096 offset=229585534976 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec 3 11:31:32 nfs1 zed[245392]: eid=82 class=checksum pool='mypool' vdev=35000cca40f38ccc4 size=4096 offset=229585534976 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec 3 11:31:32 nfs1 zed[245395]: eid=84 class=checksum pool='mypool' vdev=35000cca40f2ff80c size=4096 offset=174922317824 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec 3 11:31:32 nfs1 zed[245398]: eid=85 class=checksum pool='mypool' vdev=35000cca40f355ef8 size=4096 offset=174922317824 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec 3 11:31:32 nfs1 zed[245399]: eid=86 class=checksum pool='mypool' vdev=35000cca40f345f40 size=4096 offset=174922317824 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec 3 11:31:32 nfs1 zed[245405]: eid=87 class=checksum pool='mypool' vdev=35000cca40f3bf248 size=4096 offset=174922317824 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0