I’ve just discovered some i/o read error in a zfs pool:
[root@main-server ~]# zpool status
pool: rpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror ONLINE 0 0 0
gptid/680dc84c-e5e0-11df-aa34-406186f3d8c4 ONLINE 5 0 0
gptid/6ae63950-e5e0-11df-aa34-406186f3d8c4 ONLINE 13 0 0
errors: No known data errors
Now i want to test both disks to see if there’s something wrong.
I’m going to use Smartmontools.
To do a quick test just type: smartctl -t short /dev/adx where x is the hard drive that you want to test.
[root@main-server ~]# smartctl -t short /dev/ad4
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.1-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Thu Dec 15 10:44:09 2011
Use smartctl -X to abort test.
To see the output/log of the test use this command: smartctl -l selftest /dev/adx:
[root@main-server ~]# smartctl -l selftest /dev/ad4
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.1-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 14457 -
Now let’s do a more complete (and long) test with: smartctl -t long /dev/adx
[root@main-server ~]# smartctl -t long /dev/ad4
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.1-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 153 minutes for test to complete.
Test will complete after Thu Dec 15 13:19:20 2011
You can see the log even if the test is still running:
[root@main-server ~]# smartctl -l selftest /dev/ad4
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.1-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Self-test routine in progress 60% 14458 -
# 2 Short offline Completed without error 00% 14457 -