Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monitor commands causing POR #70

Open
ziegi opened this issue Jan 10, 2024 · 2 comments
Open

monitor commands causing POR #70

ziegi opened this issue Jan 10, 2024 · 2 comments

Comments

@ziegi
Copy link

ziegi commented Jan 10, 2024

I am running diskscan 0.19 (tried also master and 0.20)
on Debian 10 kernel 5.8 and Debian 12 kerneln 6.5
accessing SATA disks (6-16 TB Seagate, WD, Toshiba)
attached to an LSI SAS Adapter through the Linux mpt3sas driver

Each time one of the code functions (maybe more ?) in lib/diskscan.c

static void disk_ata_monitor_start(disk_t *disk)
static void disk_ata_monitor(disk_t *disk)

is executed the drive does a POR because a command times out

kernel: sd 0:0:1:0: attempting task abort!scmd(0x00000000bfee609e), outstanding for 62048 ms & timeout 60000 ms
kernel: sd 0:0:1:0: [sdb] tag#3615 CDB: ATA command pass through(12)/Blank a1 0c 0e d0 01 00 4f c2 00 b0 00 00
kernel: scsi target0:0:1: handle(0x001a), sas_address(0x300605b012dd2901), phy(1)
kernel: scsi target0:0:1: enclosure logical id(0x300605b012112900), slot(8) 
kernel: scsi target0:0:1: enclosure level(0x0000), connector name( C2  )
kernel: sd 0:0:1:0: task abort: SUCCESS scmd(0x00000000bfee609e)
kernel: mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
kernel: sd 0:0:1:0: Power-on or device reset occurred

I tried increasing the timeouts but with no success.
So i am using the following workaround to exclude a drive POR from the errors:

--- diskscan-0.20/lib/diskscan.c	2017-08-25 21:24:14.000000000 +0200
+++ ../diskscan-0.20/lib/diskscan.c	2024-01-10 11:30:21.933342563 +0100
@@ -498,7 +498,8 @@
 	data_log(&disk->data_log, offset/disk->sector_size, data_size/disk->sector_size, &io_res, t);
 
 	// Handle error or incomplete data
-	if (io_res.data != DATA_FULL || io_res.error != ERROR_NONE) {
+	if ((io_res.data != DATA_FULL || io_res.error != ERROR_NONE) 
+	    && !(errno == 0 && io_res.info.sense_key == 0x06 && io_res.info.asc == 0x29 && io_res.info.ascq == 0x00) /* ignore POR */) {
 		int s_errno = errno;
 		ERROR("Error when reading at offset %" PRIu64 " size %d read %zd, errno=%d: %s", offset, data_size, ret, errno, strerror(errno));
 		ERROR("Details: error=%s data=%s %02X/%02X/%02X", error_to_str(io_res.error), data_to_str(io_res.data),

I guess there is a better solution for this by changing the ata_monitor commands, unfortunately I do not know how.

@baruch
Copy link
Owner

baruch commented Feb 19, 2024

If the drives hit a timeout and do a reset that's not something that should be skipped and ignored.

@zougloub
Copy link

zougloub commented Nov 4, 2024

Hi, I figured I'd try this software, I'm running an LSI SAS HBA and I had been testing drives doing read and writes on them, diskscan immediately doesn't like the disk, but I can read/write fine from it using direct IO, it looks like diskscan is doing something "special" that the HBA doesn't like:

[410983.728031] sd 4:0:1:0: attempting task abort!scmd(0x0000000013012edb), outstanding for 61716 ms & timeout 60000 ms
[410983.728038] sd 4:0:1:0: [sdc] tag#9150 CDB: ATA command pass through(12)/Blank a1 0c 0e d0 01 00 4f c2 00 b0 00 00
[410983.728040] scsi target4:0:1: handle(0x000a), sas_address(0x4433221101000000), phy(1)
[410983.728043] scsi target4:0:1: enclosure logical id(0x54cd98f05e438500), slot(6) 
[410983.728045] scsi target4:0:1: enclosure level(0x0001), connector name(     )
[410983.783323] sd 4:0:1:0: task abort: SUCCESS scmd(0x0000000013012edb)
[410984.150460] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[410984.411106] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[410984.973536] sd 4:0:1:0: Power-on or device reset occurred

I don't think this is a disk problem in this particular case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants