First scrub on July 19 2020
zpoll statuspool: Tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: resilvered 24.8M in 0 days 00:00:01 with 0 errors on Sun Jul 19 00:00:32 2020
config:
NAME STATE READ WRITE CKSUM
SafeHaven ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/blip_disk1.eli ONLINE 0 0 3
gptid/blip_disk2.eli ONLINE 0 0 0
errors: No known data errors
The likely culprit
sudo zpool clear Tank gptid/blip_disk1.eli
Another scrub on Aug 9 2020
The scrub this time also caught some things, and zpool status gave the following.
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 12K in 0 days 06:29:41 with 0 errors on Sun Aug 9 06:53:43 2020
config:
NAME STATE READ WRITE CKSUM
SafeHaven ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/blip_disk1.eli ONLINE 0 0 3
gptid/blip_disk2.eli ONLINE 0 0 0
errors: No known data errors
- create a snapshot,
 - refresh my backup,
 - schedule a long SMART test and
 - (if time allows) run a memtest.
 
Note: I know that some just love recommending running a memtest. However, looking at the issue, statistically, it is extremely unlikely that it is a memory issue as proper memory - which server-grade memory is  - should pass qality checks after manufacturing and they really rarely go bad.
If the SMART tests will be passed, I will call it a day and keep observing the system. If the SMART test throws back some errors or if the error happens another time on the same drive, I will contact the retailer as the drive is well withing garantee.
Drive S.M.A.R.T. status
Checking the drive SMART status with
sudo smartctl -a /dev/ada1revealed no apparent errors with the disk. SMART tests previously all completed without errors.
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1404         -
# 2  Extended offline    Completed without error       00%      1329         -
# 3  Short offline       Completed without error       00%      1164         -
# 4  Short offline       Completed without error       00%       996         -
# 5  Short offline       Completed without error       00%       832         -
# 6  Short offline       Completed without error       00%       664         -
# 7  Short offline       Completed without error       00%       433         -
# 8  Short offline       Completed without error       00%       265         -
# 9  Extended offline    Completed without error       00%       190         -
#10  Extended offline    Completed without error       00%        18         -
#11  Short offline       Completed without error       00%         0         -
No comments:
Post a Comment