RST delete #1499

morbidrsa · 2024-12-04T06:36:57Z

No description provided.

When CAP_PERFMON and CAP_SYS_ADMIN (allow_ptr_leaks) are disabled, the verifier aims to reject partial overwrite on an 8-byte stack slot that contains a spilled pointer. However, in such a scenario, it rejects all partial stack overwrites as long as the targeted stack slot is a spilled register, because it does not check if the stack slot is a spilled pointer. Incomplete checks will result in the rejection of valid programs, which spill narrower scalar values onto scalar slots, as shown below. 0: R1=ctx() R10=fp0 ; asm volatile ( @ repro.bpf.c:679 0: (7a) *(u64 *)(r10 -8) = 1 ; R10=fp0 fp-8_w=1 1: (62) *(u32 *)(r10 -8) = 1 attempt to corrupt spilled pointer on stack processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0. Fix this by expanding the check to not consider spilled scalar registers when rejecting the write into the stack. Previous discussion on this patch is at link [0]. [0]: https://lore.kernel.org/bpf/[email protected] Fixes: ab125ed ("bpf: fix check for attempt to corrupt spilled pointer") Acked-by: Eduard Zingerman <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Tao Lyu <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>

Add a __caps_unpriv annotation so that tests requiring specific capabilities while dropping the rest can conveniently specify them during selftest declaration instead of munging with capabilities at runtime from the testing binary. While at it, let us convert test_verifier_mtu to use this new support instead. Since we do not want to include linux/capability.h, we only defined the four main capabilities BPF subsystem deals with in bpf_misc.h for use in tests. If the user passes a CAP_SYS_NICE or anything else that's not defined in the header, capability parsing code will return a warning. Also reject strtol returning 0. CAP_CHOWN = 0 but we'll never need to use it, and strtol doesn't errno on failed conversion. Fail the test in such a case. The original diff for this idea is available at link [0]. [0]: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Eduard Zingerman <[email protected]> [ Kartikeya: rebase on bpf-next, add warn to parse_caps, convert test_verifier_mtu ] Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>

Ensure that when CAP_PERFMON is dropped, and the verifier sees allow_ptr_leaks as false, we are not permitted to read from a STACK_INVALID slot. Without the fix, the test will report unexpected success in loading. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>

Add a test case to verify that without CAP_PERFMON, the test now succeeds instead of failing due to a verification error. Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>

Kumar Kartikeya Dwivedi says: ==================== Fixes for stack with allow_ptr_leaks Two fixes for usability/correctness gaps when interacting with the stack without CAP_PERFMON (i.e. with allow_ptr_leaks = false). See the commits for details. I've verified that the tests fail when run without the fixes. Changelog: ---------- v3 -> v4 v3: https://lore.kernel.org/bpf/[email protected] * Address Andrii's comments * Fix bug paperered over by missing CAP_NET_ADMIN in verifier_mtu test * Add warning when undefined CAP_ constant is specified, and fail test * Reorder annotations to be more clear * Verify that fixes fail without patches again * Add Acked-by from Andrii v2 -> v3 v2: https://lore.kernel.org/bpf/[email protected] * Address comments from Eduard * Fix comment for mark_stack_slot_misc * We can simply always return early when stype == STACK_INVALID * Drop allow_ptr_leaks conditionals * Add Eduard's __caps_unpriv patch into the series * Convert test_verifier_mtu to use it * Move existing tests to __caps_unpriv annotation and verifier_spill_fill.c * Add Acked-by from Eduard v1 -> v2 v1: https://lore.kernel.org/bpf/[email protected] * Fix CI errors in selftest by removing dependence on BPF_ST ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>

The NVMe specification states that MAXCMD is mandatory for NVMe-over-Fabrics implementations. However, some NVMe/TCP and NVMe/FC arrays from major vendors have buggy firmware that reports MAXCMD as zero in the Identify Controller data structure. Currently, the implementation closes the connection in such cases, completely preventing the host from connecting to the target. Fix the issue by printing a clear error message about the firmware bug and allowing the connection to proceed. It assumes that the target supports a MAXCMD value of SQSIZE + 1. If any issues arise, the user can manually adjust SQSIZE to mitigate them. Fixes: 4999568 ("nvme-fabrics: check max outstanding commands") Signed-off-by: Maurizio Lombardi <[email protected]> Reviewed-by: Laurence Oberman <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>

cocci warnings: (new ones prefixed by >>) >> drivers/nvme/target/pr.c:831:8-15: WARNING: kzalloc should be used for data, instead of kmalloc/memset The pattern of using 'kmalloc' followed by 'memset' is replaced with 'kzalloc', which is functionally equivalent to 'kmalloc' + 'memset', but more efficient. 'kzalloc' automatically zeroes the allocated memory, making it a faster and more streamlined solution. Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Reviewed-by: Kuan-Wei Chiu <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Yu-Chun Lin <[email protected]> Signed-off-by: Keith Busch <[email protected]>

The driver serializes ioctls through a mutex lock but access to the ioctl data buffer is not guarded by the mutex. This results in multiple user threads being able to write to the driver's ioctl buffer simultaneously. Protect the ioctl buffer with the ioctl mutex. Signed-off-by: Sumit Saxena <[email protected]> Signed-off-by: Ranjan Kumar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>

The driver, through the SAS transport, exposes a sysfs interface to enable/disable PHYs in a controller/expander setup. When multiple PHYs are disabled and enabled in rapid succession, the persistent and current config pages related to SAS IO unit/SAS Expander pages could get corrupted. Use separate memory for each config request. Signed-off-by: Prayas Patel <[email protected]> Signed-off-by: Ranjan Kumar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>

Instead of displaying the controller index starting from '1' make the driver display the controller index starting from '0'. Signed-off-by: Sumit Saxena <[email protected]> Signed-off-by: Ranjan Kumar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>

Before retrying initialization, check and abort if the fault code indicates insufficient power. Also mark the controller as unrecoverable instead of issuing reset in the watch dog timer if the fault code indicates insufficient power. Signed-off-by: Prayas Patel <[email protected]> Signed-off-by: Ranjan Kumar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>

Update driver version to 8.12.0.3.50. Signed-off-by: Ranjan Kumar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>

Only call into nvme_alloc_host_mem_single which uses dma_alloc_noncontiguous when there is non-null dma merge boundary. Without this we'll call into dma_alloc_noncontiguous for device using dma-direct, which can work fine as long as the preferred size is below the MAX_ORDER of the page allocator, but blows up with a warning if it is too large. Fixes: 63a5c7a ("nvme-pci: use dma_alloc_noncontigous if possible") Reported-by: Leon Romanovsky <[email protected]> Reported-by: Chaitanya Kumar Borah <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Tested-by: Chaitanya Kumar Borah <[email protected]> Signed-off-by: Keith Busch <[email protected]>

Current abort of bsg on timeout prematurely clears the outstanding_cmds[]. Abort does not allow FW to return the IOCB/SRB. In addition, bsg_job_done() is not called to return the BSG (i.e. leak). Abort the outstanding bsg/SRB and wait for the completion. The completion IOCB will wake up the bsg_timeout thread. If abort is not successful, then driver will forcibly call bsg_job_done() and free the srb. Err Inject: - qaucli -z - assign CT Passthru IOCB's NportHandle with another initiator nport handle to trigger timeout. Remote port will drop CT request. - bsg_job_done is properly called as part of cleanup kernel: qla2xxx [0000:21:00.1]-7012:7: qla2x00_process_ct : 286 : Error Inject. kernel: qla2xxx [0000:21:00.1]-7016:7: bsg rqst type: FC_BSG_HST_CT else type: 101 - loop-id=1 portid=fffffa. kernel: qla2xxx [0000:21:00.1]-70bb:7: qla24xx_bsg_timeout CMD timeout. bsg ptr ffff9971a42f0838 msgcode 80000004 vendor cmd fa010000 kernel: qla2xxx [0000:21:00.1]-507c:7: Abort command issued - hdl=4b, type=5 kernel: qla2xxx [0000:21:00.1]-5040:7: ELS-CT pass-through-ct pass-through error hdl=4b comp_status-status=0x5 error subcode 1=0x0 error subcode 2=0xaf882e80. kernel: qla2xxx [0000:21:00.1]-7009:7: qla2x00_bsg_job_done: sp hdl 4b, result=70000 bsg ptr ffff9971a42f0838 kernel: qla2xxx [0000:21:00.1]-802c:7: Aborting bsg ffff9971a42f0838 sp=ffff99760b87ba80 handle=4b rval=0 kernel: qla2xxx [0000:21:00.1]-708a:7: bsg abort success. bsg ffff9971a42f0838 sp=ffff99760b87ba80 handle=0x4b kernel: qla2xxx [0000:21:00.1]-7012:7: qla2x00_process_ct : 286 : Error Inject. kernel: qla2xxx [0000:21:00.1]-7016:7: bsg rqst type: FC_BSG_HST_CT else type: 101 - loop-id=1 portid=fffffa. kernel: qla2xxx [0000:21:00.1]-70bb:7: qla24xx_bsg_timeout CMD timeout. bsg ptr ffff9971a42f43b8 msgcode 80000004 vendor cmd fa010000 kernel: qla2xxx [0000:21:00.1]-7012:7: qla_bsg_found : 2206 : Error Inject 2. kernel: qla2xxx [0000:21:00.1]-802c:7: Aborting bsg ffff9971a42f43b8 sp=ffff99762c304440 handle=5e rval=5 kernel: qla2xxx [0000:21:00.1]-704f:7: bsg abort fail. bsg=ffff9971a42f43b8 sp=ffff99762c304440 rval=5. kernel: qla2xxx [0000:21:00.1]-7051:7: qla_bsg_found bsg_job_done : bsg ffff9971a42f43b8 result 0xfffffffa sp ffff99762c304440. Cc: [email protected] Fixes: c449b41 ("scsi: qla2xxx: Use QP lock to search for bsg") Signed-off-by: Quinn Tran <[email protected]> Signed-off-by: Nilesh Javali <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

System crash is observed with stack trace warning of use after free. There are 2 signals to tell dpc_thread to terminate (UNLOADING flag and kthread_stop). On setting the UNLOADING flag when dpc_thread happens to run at the time and sees the flag, this causes dpc_thread to exit and clean up itself. When kthread_stop is called for final cleanup, this causes use after free. Remove UNLOADING signal to terminate dpc_thread. Use the kthread_stop as the main signal to exit dpc_thread. [596663.812935] kernel BUG at mm/slub.c:294! [596663.812950] invalid opcode: 0000 [#1] SMP PTI [596663.812957] CPU: 13 PID: 1475935 Comm: rmmod Kdump: loaded Tainted: G IOE --------- - - 4.18.0-240.el8.x86_64 #1 [596663.812960] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/20/2012 [596663.812974] RIP: 0010:__slab_free+0x17d/0x360 ... [596663.813008] Call Trace: [596663.813022] ? __dentry_kill+0x121/0x170 [596663.813030] ? _cond_resched+0x15/0x30 [596663.813034] ? _cond_resched+0x15/0x30 [596663.813039] ? wait_for_completion+0x35/0x190 [596663.813048] ? try_to_wake_up+0x63/0x540 [596663.813055] free_task+0x5a/0x60 [596663.813061] kthread_stop+0xf3/0x100 [596663.813103] qla2x00_remove_one+0x284/0x440 [qla2xxx] Cc: [email protected] Signed-off-by: Quinn Tran <[email protected]> Signed-off-by: Nilesh Javali <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

Now while we create new ctrl failed, we have not free the tagset occupied by admin_q, here try to fix it. Fixes: fd1418d ("nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue()") Signed-off-by: Chunguang.xu <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Keith Busch <[email protected]>

Kernel will hang on destroy admin_q while we create ctrl failed, such as following calltrace: PID: 23644 TASK: ff2d52b40f439fc0 CPU: 2 COMMAND: "nvme" #0 [ff61d23de260fb78] __schedule at ffffffff8323bc15 #1 [ff61d23de260fc08] schedule at ffffffff8323c014 #2 [ff61d23de260fc28] blk_mq_freeze_queue_wait at ffffffff82a3dba1 #3 [ff61d23de260fc78] blk_freeze_queue at ffffffff82a4113a #4 [ff61d23de260fc90] blk_cleanup_queue at ffffffff82a33006 #5 [ff61d23de260fcb0] nvme_rdma_destroy_admin_queue at ffffffffc12686ce torvalds#6 [ff61d23de260fcc8] nvme_rdma_setup_ctrl at ffffffffc1268ced torvalds#7 [ff61d23de260fd28] nvme_rdma_create_ctrl at ffffffffc126919b torvalds#8 [ff61d23de260fd68] nvmf_dev_write at ffffffffc024f362 torvalds#9 [ff61d23de260fe38] vfs_write at ffffffff827d5f25 RIP: 00007fda7891d574 RSP: 00007ffe2ef06958 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 000055e8122a4d90 RCX: 00007fda7891d574 RDX: 000000000000012b RSI: 000055e8122a4d90 RDI: 0000000000000004 RBP: 00007ffe2ef079c0 R8: 000000000000012b R9: 000055e8122a4d90 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000004 R13: 000055e8122923c0 R14: 000000000000012b R15: 00007fda78a54500 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b This due to we have quiesced admi_q before cancel requests, but forgot to unquiesce before destroy it, as a result we fail to drain the pending requests, and hang on blk_mq_freeze_queue_wait() forever. Here try to reuse nvme_rdma_teardown_admin_queue() to fix this issue and simplify the code. Fixes: 958dc1d ("nvme-rdma: add clean action for failed reconnection") Reported-by: Yingfu.zhou <[email protected]> Signed-off-by: Chunguang.xu <[email protected]> Signed-off-by: Yue.zhao <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Keith Busch <[email protected]>

As we quiesce admin_q in nvme_tcp_teardown_admin_queue(), so we should no need to quiesce it in nvme_tcp_reaardown_io_queues(), make things simple. Signed-off-by: Chunguang.xu <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Keith Busch <[email protected]>

As nvme_tcp_teardown_io_queues() is the only one caller of nvme_tcp_destroy_admin_queue(), so we can merge it into nvme_tcp_teardown_io_queues() to simplify the code. Signed-off-by: Chunguang.xu <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Keith Busch <[email protected]>

Firmware supports multiple sg_cnt for request and response for CT commands, so remove the redundant check. A check is there where sg_cnt for request and response should be same. This is not required as driver and FW have code to handle multiple and different sg_cnt on request and response. Cc: [email protected] Signed-off-by: Saurav Kashyap <[email protected]> Signed-off-by: Nilesh Javali <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

NVMe controller fails to send connect command due to failure to locate hw context buffer for NVMe queue 0 (blk_mq_hw_ctx, hctx_idx=0). The cause of the issue is NPIV host did not initialize the vha->irq_offset field. This field is given to blk-mq (blk_mq_pci_map_queues) to help locate the beginning of IO Queues which in turn help locate NVMe queue 0. Initialize this field to allow NVMe to work properly with NPIV host. kernel: nvme nvme5: Connect command failed, errno: -18 kernel: nvme nvme5: qid 0: secure concatenation is not supported kernel: nvme nvme5: NVME-FC{5}: create_assoc failed, assoc_id 2e9100 ret 401 kernel: nvme nvme5: NVME-FC{5}: reset: Reconnect attempt failed (401) kernel: nvme nvme5: NVME-FC{5}: Reconnect attempt in 2 seconds Cc: [email protected] Fixes: f0783d4 ("scsi: qla2xxx: Use correct number of vectors for online CPUs") Signed-off-by: Quinn Tran <[email protected]> Signed-off-by: Nilesh Javali <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

The fc_function_template for vports was missing the .show_host_supported_speeds. The base port had the same. Add .show_host_supported_speeds to the vport template as well. Cc: [email protected] Fixes: 2c3dfe3 ("[SCSI] qla2xxx: add support for NPIV") Signed-off-by: Anil Gurumurthy <[email protected]> Signed-off-by: Nilesh Javali <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

Signed-off-by: Nilesh Javali <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

Prevent a division by 0 when monitoring is not enabled. Fixes: 1d8613a ("scsi: ufs: core: Introduce HBA performance monitor sysfs nodes") Cc: [email protected] Signed-off-by: Gwendal Grignou <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Can Guo <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

Fix a use-after-free bug in sg_release(), detected by syzbot with KASAN: BUG: KASAN: slab-use-after-free in lock_release+0x151/0xa30 kernel/locking/lockdep.c:5838 __mutex_unlock_slowpath+0xe2/0x750 kernel/locking/mutex.c:912 sg_release+0x1f4/0x2e0 drivers/scsi/sg.c:407 In sg_release(), the function kref_put(&sfp->f_ref, sg_remove_sfp) is called before releasing the open_rel_lock mutex. The kref_put() call may decrement the reference count of sfp to zero, triggering its cleanup through sg_remove_sfp(). This cleanup includes scheduling deferred work via sg_remove_sfp_usercontext(), which ultimately frees sfp. After kref_put(), sg_release() continues to unlock open_rel_lock and may reference sfp or sdp. If sfp has already been freed, this results in a slab-use-after-free error. Move the kref_put(&sfp->f_ref, sg_remove_sfp) call after unlocking the open_rel_lock mutex. This ensures: - No references to sfp or sdp occur after the reference count is decremented. - Cleanup functions such as sg_remove_sfp() and sg_remove_sfp_usercontext() can safely execute without impacting the mutex handling in sg_release(). The fix has been tested and validated by syzbot. This patch closes the bug reported at the following syzkaller link and ensures proper sequencing of resource cleanup and mutex operations, eliminating the risk of use-after-free errors in sg_release(). Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=7efb5850a17ba6ce098b Tested-by: [email protected] Fixes: cc833ac ("sg: O_EXCL and other lock handling") Signed-off-by: Suraj Sonawane <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

When the power mode change is successful but the power mode hasn't actually changed, the post notification was missed. Similar to the approach with hibernate/clock scale/hce enable, having pre/post notifications in the same function will make it easier to maintain. Additionally, supplement the description of power parameters for the pwr_change_notify callback. Fixes: 7eb584d ("ufs: refactor configuring power mode") Cc: [email protected] torvalds#6.11.x Signed-off-by: Peter Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

…VERRUN as an error This partially reverts commit 812fe64 ("scsi: storvsc: Handle additional SRB status values"). HyperV does not support MAINTENANCE_IN resulting in FC passthrough returning the SRB_STATUS_DATA_OVERRUN value. Now that SRB_STATUS_DATA_OVERRUN is treated as an error, multipath ALUA paths go into a faulty state as multipath ALUA submits RTPG commands via MAINTENANCE_IN. [ 3.215560] hv_storvsc 1d69d403-9692-4460-89f9-a8cbcc0f94f3: tag#230 cmd 0xa3 status: scsi 0x0 srb 0x12 hv 0xc0000001 [ 3.215572] scsi 1:0:0:32: alua: rtpg failed, result 458752 Make MAINTENANCE_IN return success to avoid the error path as is currently done with INQUIRY and MODE_SENSE. Suggested-by: Michael Kelley <[email protected]> Signed-off-by: Cathy Avery <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Michael Kelley <[email protected]> Reviewed-by: Ewan D. Milne <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

Since commit 771f712 ("scsi: scsi_debug: Fix cmd duration calculation"), ns_from_boot value is only evaluated in schedule_resp() for polled requests. However, ns_from_boot is also required for hrtimer support for when ndelay is less than INCLUSIVE_TIMING_MAX_NS, so fix up the logic to decide when to evaluate ns_from_boot. Fixes: 771f712 ("scsi: scsi_debug: Fix cmd duration calculation") Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>

…scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - asus-nb-wmi: Silence unknown event warning when charger is plugged in - asus-wmi: Handle return code variations during thermal policy writing graciously - samsung-laptop: Correct module description * tag 'platform-drivers-x86-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: asus-nb-wmi: Ignore unknown event 0xCF platform/x86: asus-wmi: Ignore return value when writing thermal policy platform/x86: samsung-laptop: Match MODULE_DESCRIPTION() to functionality

Cache the decision if a particular I/O needs to update RAID stripe tree entries in struct btrfs_io_context. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

Now that we have the stripe tree decision saved in struct btrfs_io_geometry we can pass it into is_single_device_io() and get rid of another call to btrfs_need_raid_stripe_tree_update(). Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

The function abort_should_print_stack() is declared in transaction.h but its definition is in ctree.c, which doesn't make sense since ctree.c is the btree implementation and the function is related to the transaction code. Move its definition into transaction.h as an inline function since it's a very short and trivial function, and also add the 'btrfs_' prefix into its name. This change also reduces the module size. Before this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1783148 161137 16920 1961205 1decf5 fs/btrfs/btrfs.ko After this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1782126 161045 16920 1960091 1de89b fs/btrfs/btrfs.ko Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

The ctree module is about the implementation of the btree data structure and not a place holder for generic filesystem things like the csum algorithm details. Move the functions related to the csum algorithm details away from ctree.c and into fs.c, which is a far better place for them. Also fix missing punctuation in comments and change one multiline comment to a single line comment since everything fits in under 80 characters. For some reason this also sligthly reduces the module's size. Before this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1782126 161045 16920 1960091 1de89b fs/btrfs/btrfs.ko After this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1782094 161045 16920 1960059 1de87b fs/btrfs/btrfs.ko Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

The declarations for the exclusive operation functions are located at fs.h but their definitions are in ioctl.c, which doesn't make much sense since (most of them) are used in several files other than ioctl.c. Since they are used in several files and they are generic enough, move them out of ioctl.c and into fs.c, even the ones that are currently only used at ioctl.c, for the sake of having them all in the same C file. This also reduces the module's size. Before this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1782094 161045 16920 1960059 1de87b fs/btrfs/btrfs.ko After this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1781492 161037 16920 1959449 1de619 fs/btrfs/btrfs.ko Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

It's a generic helper not specific to ioctls and used in several places, so move it out from ioctl.c and into fs.c. While at it change its return type from int to bool and declare the loop variable in the loop itself. This also slightly reduces the module's size. Before this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1781492 161037 16920 1959449 1de619 fs/btrfs/btrfs.ko After this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1781340 161037 16920 1959297 1de581 fs/btrfs/btrfs.ko Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

The folio ordered helper macros are defined at ctree.h but this is not the best place since ctree.{h,c} is all about the btree data structure implementation and not a generic module. So move these macros into the fs.h header. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

Currently BTRFS_BYTES_TO_BLKS() is defined in ctree.h but it's not related at all to the btree data structure, so move it into fs.h. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

Currently btrfs_alloc_write_mask() is defined in ctree.h but it's not related at all to the btree data structure, so move it into fs.h. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

We have 3 functions that have their prototypes declared in ctree.h but they are defined at extent-tree.c and they are unrelated to the btree data structure. Move the prototypes out of ctree.h and into extent-tree.h. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

It's pointless to have a comment above the prototype declarations of btrfs_ctree_init() and btrfs_ctree_exit() mentioning that they are declared in ctree.c. This is from the old days when ctree.h was used to place anything that didn't fit in any other file. So remove it. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

set squota incompat bit before committing the transaction that enables the feature With the config CONFIG_BTRFS_ASSERT enabled, an assertion failure occurs regarding the simple quota feature. [ 5.596534] assertion failed: btrfs_fs_incompat(fs_info, SIMPLE_QUOTA), in fs/btrfs/qgroup.c:365 [ 5.597098] ------------[ cut here ]------------ [ 5.597371] kernel BUG at fs/btrfs/qgroup.c:365! [ 5.597946] CPU: 1 UID: 0 PID: 268 Comm: mount Not tainted 6.13.0-rc2-00031-gf92f4749861b torvalds#146 [ 5.598450] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 5.599008] RIP: 0010:btrfs_read_qgroup_config+0x74d/0x7a0 [ 5.604303] <TASK> [ 5.605230] ? btrfs_read_qgroup_config+0x74d/0x7a0 [ 5.605538] ? exc_invalid_op+0x56/0x70 [ 5.605775] ? btrfs_read_qgroup_config+0x74d/0x7a0 [ 5.606066] ? asm_exc_invalid_op+0x1f/0x30 [ 5.606441] ? btrfs_read_qgroup_config+0x74d/0x7a0 [ 5.606741] ? btrfs_read_qgroup_config+0x74d/0x7a0 [ 5.607038] ? try_to_wake_up+0x317/0x760 [ 5.607286] open_ctree+0xd9c/0x1710 [ 5.607509] btrfs_get_tree+0x58a/0x7e0 [ 5.608002] vfs_get_tree+0x2e/0x100 [ 5.608224] fc_mount+0x16/0x60 [ 5.608420] btrfs_get_tree+0x2f8/0x7e0 [ 5.608897] vfs_get_tree+0x2e/0x100 [ 5.609121] path_mount+0x4c8/0xbc0 [ 5.609538] __x64_sys_mount+0x10d/0x150 The issue can be easily reproduced using the following reproduer: root@q:linux# cat repro.sh set -e mkfs.btrfs -f /dev/sdb > /dev/null mount /dev/sdb /mnt/btrfs btrfs quota enable -s /mnt/btrfs umount /mnt/btrfs mount /dev/sdb /mnt/btrfs The issue is that when enabling quotas, at btrfs_quota_enable(), we set BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE at fs_info->qgroup_flags and persist it in the quota root in the item with the key BTRFS_QGROUP_STATUS_KEY, but we only set the incompat bit BTRFS_FEATURE_INCOMPAT_SIMPLE_QUOTA after we commit the transaction used to enable simple quotas. This means that if after that transaction commit we unmount the filesystem without starting and committing any other transaction, or we have a power failure, the next time we mount the filesystem we will find the flag BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE set in the item with the key BTRFS_QGROUP_STATUS_KEY but we will not find the incompat bit BTRFS_FEATURE_INCOMPAT_SIMPLE_QUOTA set in the superblock, triggering an assertion failure at: btrfs_read_qgroup_config() -> qgroup_read_enable_gen() To fix this issue, set the BTRFS_FEATURE_INCOMPAT_SIMPLE_QUOTA flag immediately after setting the BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE. This ensures that both flags are flushed to disk within the same transaction. Fixes: 182940f ("btrfs: qgroup: add new quota mode for simple quotas") Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Julian Sun <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

At btrfs_is_empty_uuid() we have our custom code to check if an uuid is empty, however there a kernel uuid library that has a function named uuid_is_null() which does the same and probably more efficient. So change btrfs_is_empty_uuid() to use uuid_is_null(), which is almost a directly replacement, it just wraps the necessary casting since our uuid types are u8 arrays while the uuid kernel library uses the uuid_t type, which is just a typedef of an u8 array of 16 elements as well. Also since the function is now to trivial, make it a static inline function in fs.h. Suggested-by: Johannes Thumshirn <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

The following sysfs entries are reading super block member directly, which can have a different endian and cause wrong values: - sys/fs/btrfs/<uuid>/nodesize - sys/fs/btrfs/<uuid>/sectorsize - sys/fs/btrfs/<uuid>/clone_alignment Thankfully those values (nodesize and sectorsize) are always aligned inside the btrfs_super_block, so it won't trigger unaligned read errors, just endian problems. Fix them by using the native cached members instead. Fixes: df93589 ("btrfs: export more from FS_INFO to sysfs") Reviewed-by: Naohiro Aota <[email protected]> Signed-off-by: Qu Wenruo <[email protected]>

EDITME: Imported from [email protected] Please review before sending. Here's another set of fixes for the delete path on RAID stripe-tree backed filesystems. Josef's CI system started tripping over a bad key order due to the usage of btrfs_set_item_key_safe() in btrfs_partially_delete_raid_extent() and while investigating what is happening there I found more bugs and not handled corner cases, which resulted in more fixes and test-cases. Unfortunately I couldn't fix the bad key order problem and had to resort to re-creating the item in btrfs_partially_delete_raid_extent() and insert the new one after deleting the old. Fstests btrfs/06* are extremely good in exhibiting these failures and btrfs/060 has been extensively run while developing this series. A full CI run is undergoing at the moment: https://github.com/btrfs/linux/actions/runs/12291668397 Johannes Thumshirn (14): btrfs: don't try to delete RAID stripe-extents if we don't need to btrfs: assert RAID stripe-extent length is always greater than 0 btrfs: fix search when deleting a RAID stripe-extent btrfs: fix front delete range calculation for RAID stripe extents btrfs: fix tail delete of RAID stripe-extents btrfs: fix deletion of a range spanning parts two RAID stripe extents btrfs: implement hole punching for RAID stripe extents btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents btrfs: selftests: check for correct return value of failed lookup btrfs: selftests: don't split RAID extents in half btrfs: selftests: test RAID stripe-tree deletion spanning two items btrfs: selftests: add selftest for punching holes into the RAID stripe extents btrfs: selftests: add test for punching a hole into 3 RAID stripe-extents btrfs: selftests: add a selftest for deleting two out of three extents fs/btrfs/ctree.c | 1 + fs/btrfs/raid-stripe-tree.c | 154 +++++- fs/btrfs/tests/raid-stripe-tree-tests.c | 653 +++++++++++++++++++++++- 3 files changed, 776 insertions(+), 32 deletions(-) -- 2.43.0 --- b4-submit-tracking --- # This section is used internally by b4 prep for tracking purposes. { "series": { "revision": 2, "change-id": "20241218-rst-delete-fixes-f2659047f627", "prefixes": [], "from-thread": "[email protected]" } }

Don't try to delete RAID stripe-extents if we don't need to. Signed-off-by: Johannes Thumshirn <[email protected]>

When modifying a RAID stripe-extent, ASSERT() that the length of the new RAID stripe-extent is always greater than 0. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]>

When searching for a RAID stripe-extent for deletion, use an offset of -1 to always get the "left" slot in the btree and correctly handle the slot selection. Signed-off-by: Johannes Thumshirn <[email protected]>

When deleting the front of a RAID stripe-extent the delete code miscalculates the size on how much to pad the remaining extent part in the front. Fix the calculation so we're always having the sizes we expect. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]>

Fix tail delete of RAID stripe-extents, if there is a range to be deleted as well after the tail delete of the extent. Signed-off-by: Johannes Thumshirn <[email protected]>

When a user requests the deletion of a range that spans multiple stripe extents and btrfs_search_slot() returns us the second RAID stripe extent, we need to pick the previous item and truncate it, if there's still a range to delete left, move on to the next item. The following diagram illustrates the operation: |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---| |--- keep ---|--- drop ---| While at it, comment the trivial case of a whole item delete as well. Signed-off-by: Johannes Thumshirn <[email protected]>

If the stripe extent we want to delete starts before the range we want to delete and ends after the range we want to delete we're punching a hole in the stripe extent: |--- RAID Stripe Extent ---| | keep |--- drop ---| keep | This means we need to a) truncate the existing item and b) create a second item for the remaining range. Signed-off-by: Johannes Thumshirn <[email protected]>

Don't use btrfs_set_item_key_safe() to modify the keys in the RAID stripe-tree as this can lead to corruption of the tree, which is caught by the checks in btrfs_set_item_key_safe(): BTRFS critical (device nvme1n1): slot 201 key (5679448064 230 32768) new key (5680439296 230 1028096) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:2672! Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 1 UID: 0 PID: 1055 Comm: fsstress Not tainted 6.13.0-rc1+ #1464 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0xf7/0x270 Code: <snip> RSP: 0018:ffffc90001337ab0 EFLAGS: 00010287 RAX: 0000000000000000 RBX: ffff8881115fd000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000ffffffff RBP: ffff888110ed6f50 R08: 00000000ffffefff R09: ffffffff8244c500 R10: 00000000ffffefff R11: 00000000ffffffff R12: ffff888100586000 R13: 00000000000000c9 R14: ffffc90001337b1f R15: ffff888110f23b58 FS: 00007f7d75c72740(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa811652c60 CR3: 0000000111398001 CR4: 0000000000370eb0 Call Trace: <TASK> ? __die_body.cold+0x14/0x1a ? die+0x2e/0x50 ? do_trap+0xca/0x110 ? do_error_trap+0x65/0x80 ? btrfs_set_item_key_safe+0xf7/0x270 ? exc_invalid_op+0x50/0x70 ? btrfs_set_item_key_safe+0xf7/0x270 ? asm_exc_invalid_op+0x1a/0x20 ? btrfs_set_item_key_safe+0xf7/0x270 btrfs_partially_delete_raid_extent+0xc4/0xe0 btrfs_delete_raid_extent+0x227/0x240 __btrfs_free_extent.isra.0+0x57f/0x9c0 ? exc_coproc_segment_overrun+0x40/0x40 __btrfs_run_delayed_refs+0x2fa/0xe80 btrfs_run_delayed_refs+0x81/0xe0 btrfs_commit_transaction+0x2dd/0xbe0 ? preempt_count_add+0x52/0xb0 btrfs_sync_file+0x375/0x4c0 do_fsync+0x39/0x70 __x64_sys_fsync+0x13/0x20 do_syscall_64+0x54/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f7d7550ef90 Code: <snip> RSP: 002b:00007ffd70237248 EFLAGS: 00000202 ORIG_RAX: 000000000000004a RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f7d7550ef90 RDX: 000000000000013a RSI: 000000000040eb28 RDI: 0000000000000004 RBP: 000000000000001b R08: 0000000000000078 R09: 00007ffd7023725c R10: 00007f7d75400390 R11: 0000000000000202 R12: 028f5c28f5c28f5c R13: 8f5c28f5c28f5c29 R14: 000000000040b520 R15: 00007f7d75c726c8 </TASK> Instead copy the item, adjust the key and per-device physical addresses and re-insert it into the tree. Signed-off-by: Johannes Thumshirn <[email protected]>

Commit 5e72aab ("btrfs: return ENODATA in case RST lookup fails") changed btrfs_get_raid_extent_offset()'s return value to ENODATA in case the RAID stripe-tree lookup failed. Adjust the test cases which check for absence of a given range to check for ENODATA as return value in this case. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]>

The selftests for partially deleting the start or tail of RAID stripe-extents split these extents in half. This can hide errors in the calculation, so don't split the RAID stripe-extents in half but delete the first or last 16K of the 64K extents. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]>

Add a selftest for RAID stripe-tree deletion with a delete range spanning two items, so that we're punching a hole into two adjacent RAID stripe extents truncating the first and "moving" the second to the right. The following diagram illustrates the operation: |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---| |----- keep -----|--- drop ---|----- keep ----| Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]>

…e extents Add a selftest for punching a hole into a RAID stripe extent. The test create an 1M extent and punches a 64k bytes long hole at offset of 32k from the start of the extent. Afterwards it verifies the start and length of both resulting new extents "left" and "right" as well as the absence of the hole. Signed-off-by: Johannes Thumshirn <[email protected]>

…ents Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]>

Add a selftest creating three extents and then deleting two out of the three extents. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]>

morbidrsa force-pushed the rst-delete branch from b91e3d0 to 9125fb5 Compare December 4, 2024 07:42

lvtao-sec and others added 29 commits December 4, 2024 09:19

scsi: mpi3mr: Update driver version to 8.12.0.3.50

0deb37c

Update driver version to 8.12.0.3.50. Signed-off-by: Ranjan Kumar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Update version to 10.02.09.400-k

35002a8

Signed-off-by: Nilesh Javali <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

morbidrsa and others added 29 commits December 18, 2024 02:32

btrfs: don't try to delete RAID stripe-extents if we don't need to

949c9d3

Don't try to delete RAID stripe-extents if we don't need to. Signed-off-by: Johannes Thumshirn <[email protected]>

btrfs: assert RAID stripe-extent length is always greater than 0

79e6334

When modifying a RAID stripe-extent, ASSERT() that the length of the new RAID stripe-extent is always greater than 0. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]>

btrfs: fix search when deleting a RAID stripe-extent

d5fc4f7

When searching for a RAID stripe-extent for deletion, use an offset of -1 to always get the "left" slot in the btree and correctly handle the slot selection. Signed-off-by: Johannes Thumshirn <[email protected]>

btrfs: fix tail delete of RAID stripe-extents

a74d233

Fix tail delete of RAID stripe-extents, if there is a range to be deleted as well after the tail delete of the extent. Signed-off-by: Johannes Thumshirn <[email protected]>

btrfs: selftests: add test for punching a hole into 3 RAID stripe-ext…

5ea820a

…ents Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]>

btrfs: selftests: add a selftest for deleting two out of three extents

84ab2ff

Add a selftest creating three extents and then deleting two out of the three extents. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Filipe Manana <[email protected]>

morbidrsa force-pushed the rst-delete branch from 8b38b12 to 84ab2ff Compare December 18, 2024 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RST delete #1499

RST delete #1499

morbidrsa commented Dec 4, 2024

RST delete #1499

Are you sure you want to change the base?

RST delete #1499

Conversation

morbidrsa commented Dec 4, 2024