First scrub on July 19 2020
zpoll statuspool: Tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: resilvered 24.8M in 0 days 00:00:01 with 0 errors on Sun Jul 19 00:00:32 2020
config:
NAME STATE READ WRITE CKSUM
SafeHaven ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/blip_disk1.eli ONLINE 0 0 3
gptid/blip_disk2.eli ONLINE 0 0 0
errors: No known data errors
The likely culprit
sudo zpool clear Tank gptid/blip_disk1.eli
Another scrub on Aug 9 2020
The scrub this time also caught some things, and zpool status gave the following.
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 12K in 0 days 06:29:41 with 0 errors on Sun Aug 9 06:53:43 2020
config:
NAME STATE READ WRITE CKSUM
SafeHaven ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/blip_disk1.eli ONLINE 0 0 3
gptid/blip_disk2.eli ONLINE 0 0 0
errors: No known data errors
- create a snapshot,
- refresh my backup,
- schedule a long SMART test and
- (if time allows) run a memtest.
Note: I know that some just love recommending running a memtest. However, looking at the issue, statistically, it is extremely unlikely that it is a memory issue as proper memory - which server-grade memory is - should pass qality checks after manufacturing and they really rarely go bad.
If the SMART tests will be passed, I will call it a day and keep observing the system. If the SMART test throws back some errors or if the error happens another time on the same drive, I will contact the retailer as the drive is well withing garantee.
Drive S.M.A.R.T. status
Checking the drive SMART status with
sudo smartctl -a /dev/ada1revealed no apparent errors with the disk. SMART tests previously all completed without errors.
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1404 -
# 2 Extended offline Completed without error 00% 1329 -
# 3 Short offline Completed without error 00% 1164 -
# 4 Short offline Completed without error 00% 996 -
# 5 Short offline Completed without error 00% 832 -
# 6 Short offline Completed without error 00% 664 -
# 7 Short offline Completed without error 00% 433 -
# 8 Short offline Completed without error 00% 265 -
# 9 Extended offline Completed without error 00% 190 -
#10 Extended offline Completed without error 00% 18 -
#11 Short offline Completed without error 00% 0 -