Skip to content

Native encryption pool - mount fails with cannot mount '<poolname>': Invalid exchange #18011

@scratchings

Description

@scratchings

System information

Type Version/Name
Distribution Name Rocky Linux
Distribution Version 9.6
Kernel Version 5.14.0-570.42.2.el9_6.x86_64
Architecture x86_64
OpenZFS Version 2.2.8

Describe the problem you're observing

Yesterday noticed that the monthly scrub was running and we briefly saw:

errors: Permanent errors have been detected in the following files:

But no files listed. Shortly afterward the message disappeared from the status output.

This system is a Pacemaker managed HA system and this morning I had need to fail-over to the other node whilst I installed some OS updates (forgetting that the scrub was running). Because the scrub was running the failover timed out resulting in a multi-path SCSI fence event - ordinarily this doesn't cause a permanent problem however this time the fail-over host (now running RL 9.7 and ZFS 2.2.9) failed to mount the file systems with the error:

cannot mount '': Invalid exchange

zpool status showed the scrub continuing with the same 'empty' permanent errors message and four drives each out of two Z3 vdevs showing checksum errors. Each time the mount command is run the checksum errors on these drives increment by two.

# zpool status -v
  pool: <poolname>
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Wed Dec  3 09:41:59 2025
	40.9T / 61.4T scanned at 8.17G/s, 32.3T / 61.4T issued at 6.45G/s
	0B repaired, 52.59% done, 01:16:56 to go
config:

	NAME                   STATE     READ WRITE CKSUM
	xnat                   ONLINE       0     0     0
	  raidz3-0             ONLINE       0     0     0
	    35000cca40f3bf248  ONLINE       0     0    34
	    35000cca40f345f40  ONLINE       0     0    34
	    35000cca40f355ef8  ONLINE       0     0    34
	    35000cca40f2ff80c  ONLINE       0     0    34
	    35000cca40f346aa8  ONLINE       0     0     0
	    35000cca40f3c4c88  ONLINE       0     0     0
	    35000cca40f3a9a28  ONLINE       0     0     0
	    35000cca40f380e00  ONLINE       0     0     0
	    35000cca40f3c5730  ONLINE       0     0     0
	    35000cca40f3ac508  ONLINE       0     0     0
	    35000cca40f3a4750  ONLINE       0     0     0
	  raidz3-1             ONLINE       0     0     0
	    35000cca40f3c5a88  ONLINE       0     0     0
	    35000cca40f3a55c0  ONLINE       0     0     0
	    35000cca40f3964b4  ONLINE       0     0     0
	    35000cca40f3a547c  ONLINE       0     0     0
	    35000cca40f3a9744  ONLINE       0     0     0
	    35000cca40f3c5558  ONLINE       0     0     0
	    35000cca40f3a52e4  ONLINE       0     0     0
	    35000cca40f37fc78  ONLINE       0     0     0
	    35000cca40f3c4e98  ONLINE       0     0     0
	    35000cca40f3c4cbc  ONLINE       0     0     0
	    35000cca40f3a4f40  ONLINE       0     0     0
	  raidz3-2             ONLINE       0     0     0
	    35000cca40f3c5bf8  ONLINE       0     0    34
	    35000cca40f377e04  ONLINE       0     0     0
	    35000cca40f3aa0d0  ONLINE       0     0     0
	    35000cca40f3c57b8  ONLINE       0     0     0
	    35000cca40f2e323c  ONLINE       0     0     0
	    35000cca40f3936e0  ONLINE       0     0     0
	    35000cca40f3a44f0  ONLINE       0     0     0
	    35000cca40f3a465c  ONLINE       0     0     0
	    35000cca40f3b2c98  ONLINE       0     0    34
	    35000cca40f38ccc4  ONLINE       0     0    34
	    35000cca40f3b6954  ONLINE       0     0    34
	  raidz3-3             ONLINE       0     0     0
	    35000cca40f392c24  ONLINE       0     0     0
	    35000cca40f3a4574  ONLINE       0     0     0
	    35000cca40f377b3c  ONLINE       0     0     0
	    35000cca40f377b08  ONLINE       0     0     0
	    35000cca40f3a979c  ONLINE       0     0     0
	    35000cca40f2ff97c  ONLINE       0     0     0
	    35000cca40f38cd4c  ONLINE       0     0     0
	    35000cca40f3b2d94  ONLINE       0     0     0
	    35000cca40f38cdfc  ONLINE       0     0     0
	    35000cca40f3c4df8  ONLINE       0     0     0
	    35000cca40f3c5aec  ONLINE       0     0     0
	  raidz3-4             ONLINE       0     0     0
	    35000cca40f3a3de4  ONLINE       0     0     0
	    35000cca40f3a4340  ONLINE       0     0     0
	    35000cca40f3c55f4  ONLINE       0     0     0
	    35000cca40f3086ac  ONLINE       0     0     0
	    35000cca40f39d64c  ONLINE       0     0     0
	    35000cca40f377b68  ONLINE       0     0     0
	    35000cca40f3c5624  ONLINE       0     0     0
	    35000cca40f38cfd8  ONLINE       0     0     0
	    35000cca40f3a9cc8  ONLINE       0     0     0
	    35000cca40f38de68  ONLINE       0     0     0
	    35000cca40f3adcb0  ONLINE       0     0     0
	logs	
	  mirror-5             ONLINE       0     0     0
	    358ce38ee22ebfe59  ONLINE       0     0     0
	    358ce38ee22ec0131  ONLINE       0     0     0
	spares
	  35000cca40f3aa560    AVAIL   
	  35000cca40f3b777c    AVAIL   
	  35000cca40f3477c0    AVAIL   

errors: Permanent errors have been detected in the following files:

On further investigation, the checksum errors were first reported by zed on the 1st of December when the monthly scrub started. This predates any failover event, so the failover isn't significant here.

After the scrub completed zfs mount -a still reported the same error with an increment of 2 to each checksum error. Reboot did not help.
We have discovered that it is the root of the pool that won't mount (which itself isn't encrypted) - child datasets do mount, e.g.

pool/shares/folder1

where "shares" is the encryption root and "folder1" is a child dataset of that, both "shares" and "shares/folder1" mount fine, just not "pool".

Describe how to reproduce the problem

Create natively encrypted folders under a pool, wait?

Include any warning/errors/backtraces from the system logs

dmesg doesn't show any SCSI timeouts or errors

After failed mount /var/log/messages shows:

Dec  3 11:31:32 nfs1 zed[245378]: eid=79 class=data pool='mypool' priority=0 err=52 flags=0x808081 bookmark=54:0:0:0
Dec  3 11:31:32 nfs1 zed[245383]: eid=80 class=checksum pool='mypool' vdev=35000cca40f3c5bf8 size=4096 offset=229585539072 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec  3 11:31:32 nfs1 zed[245385]: eid=81 class=checksum pool='mypool' vdev=35000cca40f3b6954 size=4096 offset=229585534976 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec  3 11:31:32 nfs1 zed[245393]: eid=83 class=checksum pool='mypool' vdev=35000cca40f3b2c98 size=4096 offset=229585534976 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec  3 11:31:32 nfs1 zed[245392]: eid=82 class=checksum pool='mypool' vdev=35000cca40f38ccc4 size=4096 offset=229585534976 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec  3 11:31:32 nfs1 zed[245395]: eid=84 class=checksum pool='mypool' vdev=35000cca40f2ff80c size=4096 offset=174922317824 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec  3 11:31:32 nfs1 zed[245398]: eid=85 class=checksum pool='mypool' vdev=35000cca40f355ef8 size=4096 offset=174922317824 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec  3 11:31:32 nfs1 zed[245399]: eid=86 class=checksum pool='mypool' vdev=35000cca40f345f40 size=4096 offset=174922317824 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0
Dec  3 11:31:32 nfs1 zed[245405]: eid=87 class=checksum pool='mypool' vdev=35000cca40f3bf248 size=4096 offset=174922317824 priority=0 err=52 flags=0x100080 bookmark=54:0:0:0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions