I’ve just discovered some i/o read error in a zfs pool:
[root@main-server ~]# zpool status pool: rpool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 gptid/680dc84c-e5e0-11df-aa34-406186f3d8c4 ONLINE 5 0 0 gptid/6ae63950-e5e0-11df-aa34-406186f3d8c4 ONLINE 13 0 0 errors: No known data errors
Now i want to test both disks to see if there’s something wrong.
I’m going to use Smartmontools.
To do a quick test just type: smartctl -t short /dev/adx where x is the hard drive that you want to test.
[root@main-server ~]# smartctl -t short /dev/ad4 smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.1-RELEASE-p5 amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Thu Dec 15 10:44:09 2011 Use smartctl -X to abort test.
To see the output/log of the test use this command: smartctl -l selftest /dev/adx:
[root@main-server ~]# smartctl -l selftest /dev/ad4 smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.1-RELEASE-p5 amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 14457 -
Now let’s do a more complete (and long) test with: smartctl -t long /dev/adx
[root@main-server ~]# smartctl -t long /dev/ad4 smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.1-RELEASE-p5 amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 153 minutes for test to complete. Test will complete after Thu Dec 15 13:19:20 2011
You can see the log even if the test is still running:
[root@main-server ~]# smartctl -l selftest /dev/ad4 smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.1-RELEASE-p5 amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Self-test routine in progress 60% 14458 - # 2 Short offline Completed without error 00% 14457 -
