Replacing a (silently) failing disk in a ZFS pool
by Emile `iMil' Heitor - 2019-07-02
Maybe I can’t read, but I have the feeling that official documentations explain every single corner case for a given tool, except the one you will actually need. My today’s struggle: replacing a disk within a FreeBSD ZFS pool.
What? there’s a shitton of docs on this topic! Are you stupid?
I don’t know, maybe. Yet none covered the process in a simple, straight and complete manner. Here’s the story:
Since yesterday I felt my personal FreeBSD NAS was sluggish, and this morning, I saw those horrible messages popping in my syslog
console:
Jul 2 12:49:53 <kern.crit> newcoruscant kernel: ahcich1: Timeout on slot 8 port 0
Jul 2 12:49:53 <kern.crit> newcoruscant kernel: ahcich1: is 00000000 cs 00000000 ss 00000300 rs 00000300 tfd 40 serr 00000000 cmd 0000c917
Jul 2 12:49:53 <kern.crit> newcoruscant kernel: (ada1:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 50 25 e9 40 3b 00 00 00 00 00
Jul 2 12:49:53 <kern.crit> newcoruscant kernel: (ada1:ahcich1:0:0:0): CAM status: Command timeout
Jul 2 12:49:53 <kern.crit> newcoruscant kernel: (ada1:ahcich1:0:0:0): Retrying command
Jul 2 12:51:02 <kern.crit> newcoruscant kernel: cant/memory/memory-inactive: ds[0] = 52350976.000000
Jul 2 12:51:02 <kern.crit> newcoruscant kernel: ahcich1: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Yeah… that bad.
The first thing that stroke me is that ZFS seemed perfectly fine with that:
root@newcoruscant:~ # zpool status
pool: zroot
state: ONLINE
scan: scrub repaired 0 in 2h26m with 0 errors on Tue Jun 25 12:08:56 2019
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada0p4 ONLINE 0 0 0
ada1p4 ONLINE 0 0 0
ada2p4 ONLINE 0 0 0
errors: No known data errors
But the input/output
error thrown by smartctl -a /dev/ada1
made things clear, I needed to replace this disk quickly!
Thanks to past-me, there already was a disk ready for this task at ada3
, so, after trustfully reading the zpool administration guide, and in particular Replacing a Functioning Device, I entered:
# zpool replace zroot ada1p4 ada3p4
Except it didn’t ran as expected:
cannot open 'ada3p4': no such GEOM provider
must be a full path or shorthand device name
What a fantastic and explicit error message just to say that ada3
doesn’t have a corresponding partition table.
I am no FreeBSD guru and very occasional user, so no, I am not used to GEOM
, gpart
, GELI
etc… finally, this very well written stackexchange post showed me how to replicate the correct partition table to the new disk:
# gpart backup ada0|gpart restore -F ada3
Now zpool replace zroot ada1p4 ada3p4
would work! I also did not forget to replicate the boot sequence to the new disk as instructed by both the documentation and zpool
(Warning! freebsd-boot
is on ada3p2
) :
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 2 ada3
partcode written to ada3p2
bootcode written to ada3
And at last the silvering was taking place:
root@newcoruscant:~ # zpool status
pool: zroot
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Jul 2 11:21:24 2019
3.91M scanned out of 1.84T at 38.5K/s, (scan is slow, no estimated time)
1.30M resilvered, 0.00% done
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada0p4 ONLINE 0 0 0
replacing-1 ONLINE 0 0 0
ada1p4 ONLINE 0 0 0
ada3p4 ONLINE 0 0 0
ada2p4 ONLINE 0 0 0
errors: No known data errors
But… at less than 40K/s! Turns out that very logically the failing disk and its timeouts was slowing down the silvering, so I learned that to avoid this kind of situation, you should offline
the failing disk from the zpool
:
# zpool offline zroot ada1p4
And then
$ sudo zpool status
pool: zroot
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Jul 2 16:01:22 2019
514G scanned out of 1.84T at 167M/s, 2h20m to go
170G resilvered, 27.22% done
config:
NAME STATE READ WRITE CKSUM
zroot DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ada0p4 ONLINE 0 0 0
replacing-1 DEGRADED 0 0 8
15084350875675872541 OFFLINE 0 0 0 was /dev/ada1p4
ada3p4 ONLINE 0 0 0
ada2p4 ONLINE 0 0 0
errors: No known data errors
Much better. At the end of the resilvering, everything is now working correctly:
$ sudo zpool status
pool: zroot
state: ONLINE
scan: resilvered 628G in 2h52m with 0 errors on Tue Jul 2 18:53:48 2019
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada0p4 ONLINE 0 0 0
ada3p4 ONLINE 0 0 0
ada2p4 ONLINE 0 0 0
errors: No known data errors
I read that you should zpool remove
the failing disk at the end of this operation, but when trying to do so:
root@newcoruscant:~ # zpool remove zroot ada1p4
cannot remove ada1p4: no such device in pool
root@newcoruscant:~ # zpool remove zroot 15084350875675872541
cannot remove 15084350875675872541: no such device in pool
So I guess zpool
did it itself.
Now it’s time to buy and add a new spare for the next disk that fails…