A week ago, I’ve migrated my gateway to a standalone machine running FreeBSD 13.1. The whole process was certainly flawless and soon enough it was forwarding packets to and from my network.
Quite happy with the result, I didn’t expect it to crash less than 2 days after its first production hour. At first I thought it could have been the temperature, the graphic card, the memory… until it crashed again a couple of days after and I saw this:
(ada0:ata2:0:0:0): WRITE_DMA48. ACB: 35 00 b0 48 01 40 11 00 00 00 10 00 (ada0:ata2:0:0:0): CAM status: Command timeout (ada0:ata2:0:0:0): Retrying command, 3 more tries remain
I’m quite used to disc errors, but with mechanical discs, this is an SSD drive with no RAID or the like, a simple, straight FreeBSD installation on a single disk system.
As often when facing this kind of weird behavior, after an extensive search over problem reports and blog posts, I tend to call my Twitter community for help, usually they are super helpful and at least give me some good pointers and new ideas. The one reply that matched the most with this issue was Sam Wiltshire’s. He happens to have the same disc brand… and the same error messages, except it doesn’t make his zpool stop, which it does on my machine, freezing the operating system in the process.
If your zpool is composed of one of those
ada0: <KINGSTON SA400S37240G SBFKB1E1>, you might be up for a disc upgrade. I just finished doing it, and I’ll tell you how it’s been doing.
I’m not comfortable with ZFS, probably because the concepts used are not directly mappable to the classic filesystems we are used to manipulate. And also because its documentation is, IMHO, an immense spaghetti bowl where you’re supposed to understand those very concepts before doing anything.
Let me give you an example. A lot of blog posts indicate to
import a pool. What exactly
import do? let’s have a look at
DESCRIPTION zpool export [-a] [-f] pool... Exports the given pools from the system. All devices are marked as exported, but are still considered in use by other subsystems. The devices can be moved between systems (even those of different endianness) and imported as long as a sufficient number of devices are present.
zpool import -a [-DflmN] [-F [-n] [-T] [-X]] [-c cachefile|-d dir|device] [-o mntopts] [-o property=value]... [-R root] [-s] Imports all pools found in the search directories.
Turns out “exporting” a pool is pretty much “disconnecting” it, and “importing” it, “connects” it. And possibly
mount some directories if
-N is not added to the
import command. But not all is mounted. Because… stuff. More precisely, the boot environment, containing the root filesystem, is not mounted when doing
zpool import -R /mnt, but subdirectories (
usr/ports etc…) are. Emanon on Twitter found that this particular dataset options had
canmount=noauto, this explains why it is not mounted by
import, but not the reason it is excluded.
That being said, let’s dive into the actual disc migration.
The only resource I found on such topic was this blog post from 2014, again I asked to my Twitter followers if they thought it was still accurate, and they unanimously said they didn’t see anything wrong with it; and they were right, only a subtle change regarding the infamous boot environment. This walkthrough is mainly inspired by the blog post I mentioned, with unnecessary pieces removed and corrected commands.
So here we go. The first disc,
ada0, the failing Kigston, has the following partitioning:
~# gpart show ada0 => 40 468862048 ada0 GPT (224G) 40 1024 1 freebsd-boot (512K) 1064 984 - free - (492K) 2048 4194304 2 freebsd-swap (2.0G) 4196352 464664576 3 freebsd-zfs (222G) 468860928 1160 - free - (580K)
The new disc is
ada1: <PNY 120GB SATA SSD V0218A0> ACS-2 ATA SATA 3.x device, it’s smaller, so I didn’t copy the partition scheme from
~# gpart add -t freebsd-swap -s 2.0G -l swap1 ada1 ~# gpart add -t freebsd-swap -s 2G -l swap1 ada1 ~# gpart add -t freebsd-zfs -s 109G -l zfs1 ada1
Which gives us the following scheme:
~# gpart show ada1 => 40 234441568 ada1 GPT (112G) 40 1024 1 freebsd-boot (512K) 1064 4194304 2 freebsd-swap (2.0G) 4195368 228589568 3 freebsd-zfs (109G) 232784936 1656672 - free - (809M)
In order to be able to boot, we must populate the first partition,
freebsd-boot with a bootloader, this is done by this well known command:
~# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1
And now the real thing, first let’s create a new
ZFS pool that has a different name from the already living one:
~# zpool create newpool gpt/zfs1
From there you can check your pool using the
zpool list command.
Create a snapshot of the pool (here
zroot) you want to migrate:
~# zfs snapshot -r zroot@backup
And transfer it to the new pool
~# zfs send -vR zroot@backup | zfs receive -vFd newpool
Make the new pool bootable
~# zpool set bootfs=newpool/ROOT/default newpool
And declare the boot environment mount point
~# zfs set mountpoint=/ newpool/ROOT/default
I didn’t touch at the
zpool.cache file because I was told it was probably not necessary these days, but then after rebooting on the new drive, I saw this message while booting:
ZFS WARNING: unable to attach to ada0p3 cannot import 'zroot'
This occurs because the
zpool.cache file still had a reference to the previous pool, fortunately this is easily fixed by rebuilding the file, like explained in this blog post.
~# zpool set cachefile=/etc/zfs/zpool.cache
And there we go, migration complete and a couple of concept better understood, yet lots of mysteries under the hood.