Skip to content

Grow block size check in zfs_rangelock_cb() needs improvement #18046

@mmaybee

Description

@mmaybee

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 24.04
Kernel Version 16.14.0
Architecture x86_64
OpenZFS Version zfs-2.4.99

Describe the problem you're observing

In zfs_rangelock_cb() there is a test to determine if the file block size could increase. This check is necessary because in order to safely increase the file block size, the entire file range [0..UINT64_MAX] needs to be locked for writing. No other readers or writers may be active in the file. The current code checks to see if the write is beyond the current block size, and if the block size is less than the current max block size for the file system. If these conditions are true, the range lock is adjusted to cover the total range.

The current check:

        /*
         * If we need to grow the block size then lock the whole file range.
         */
        uint64_t end_size = MAX(zp->z_size, new->lr_offset + new->lr_length);
        if (end_size > zp->z_blksz && (!ISP2(zp->z_blksz) ||
            zp->z_blksz < ZTOZSB(zp)->z_max_blksz)) {
                new->lr_offset = 0;
                new->lr_length = UINT64_MAX;
        }

This check is not strong enough. If the file already has more than one block, then the current block size cannot change. So if the file block size is less than the maximum block size supported by the file system, and there are multiple blocks in the file, the current code will almost always extend the rangelock to its maximum size. This means that all writes become serialized and even reads are slowed as they will more often contend with writes.

Suggested fix:

        /*
         * If we might grow the block size then lock the whole file range.
         * NB: this test should match the check in zfs_grow_blocksize
         */
        uint64_t end_size = MAX(zp->z_size, new->lr_offset + new->lr_length);
        if (zp->z_size <= zp->z_blksz && end_size > zp->z_blksz &&
            (!ISP2(zp->z_blksz) || zp->z_blksz < ZTOZSB(zp)->z_max_blksz)) {
                new->lr_offset = 0;
                new->lr_length = UINT64_MAX;
        }

Describe how to reproduce the problem

While it is probably rare for this situation to occur, it is easy to reproduce:

  1. Set the file system block size to something small (e.g. 32k): zfs set recordsize=32k test
  2. Create a file larger than the block size: dd if=/dev/urandom of=/test/testfile bs=32k count=1000
  3. Set the file system block size to something larger: zfs set recordsize=128k test

At this point almost all writes to testfile will trigger the over-locking and IO performance will suffer. Unfortunately, there is no easy way to observe what is happening (beyond the performance degradation) as there is no observability in the code for range locking. This issue was discovered via kernel debugging.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions