Migrating A ZPool To A Smaller Disk

A week ago, I’ve migrated my gateway to a standalone machine running FreeBSD 13.1. The whole process was certainly flawless and soon enough it was forwarding packets to and from my network.

Quite happy with the result, I didn’t expect it to crash less than 2 days after its first production hour. At first I thought it could have been the temperature, the graphic card, the memory… until it crashed again a couple of days after and I saw this:

(ada0:ata2:0:0:0): WRITE_DMA48. ACB: 35 00 b0 48 01 40 11 00 00 00 10 00
(ada0:ata2:0:0:0): CAM status: Command timeout
(ada0:ata2:0:0:0): Retrying command, 3 more tries remain

I’m quite used to disc errors, but with mechanical discs, this is an SSD drive with no RAID or the like, a simple, straight FreeBSD installation on a single disk system.
As often when facing this kind of weird behavior, after an extensive search over problem reports and blog posts, I tend to call my Twitter community for help, usually they are super helpful and at least give me some good pointers and new ideas. The one reply that matched the most with this issue was Sam Wiltshire’s. He happens to have the same disc brand… and the same error messages, except it doesn’t make his zpool stop, which it does on my machine, freezing the operating system in the process.

If your zpool is composed of one of those ada0: <KINGSTON SA400S37240G SBFKB1E1>, you might be up for a disc upgrade. I just finished doing it, and I’ll tell you how it’s been doing.

I’m not comfortable with ZFS, probably because the concepts used are not directly mappable to the classic filesystems we are used to manipulate. And also because its documentation is, IMHO, an immense spaghetti bowl where you’re supposed to understand those very concepts before doing anything.

Let me give you an example. A lot of blog posts indicate to export and import a pool. What exactly export and import do? let’s have a look at zpool-export(8):

DESCRIPTION
     zpool export [-a] [-f] pool...
             Exports the given pools from the system.  All devices are marked
             as exported, but are still considered in use by other subsystems.
             The devices can be moved between systems (even those of different
             endianness) and imported as long as a sufficient number of
             devices are present.

OK, maybe zpool-import(8) then:

     zpool import -a [-DflmN] [-F [-n] [-T] [-X]] [-c cachefile|-d dir|device]
             [-o mntopts] [-o property=value]... [-R root] [-s]
             Imports all pools found in the search directories.

I see…
Turns out “exporting” a pool is pretty much “disconnecting” it, and “importing” it, “connects” it. And possibly mount some directories if -N is not added to the import command. But not all is mounted. Because… stuff. More precisely, the boot environment, containing the root filesystem, is not mounted when doing zpool import -R /mnt, but subdirectories (home, usr/ports etc…) are. Emanon on Twitter found that this particular dataset options had canmount=noauto, this explains why it is not mounted by import, but not the reason it is excluded.

That being said, let’s dive into the actual disc migration.
The only resource I found on such topic was this blog post from 2014, again I asked to my Twitter followers if they thought it was still accurate, and they unanimously said they didn’t see anything wrong with it; and they were right, only a subtle change regarding the infamous boot environment. This walkthrough is mainly inspired by the blog post I mentioned, with unnecessary pieces removed and corrected commands.

So here we go. The first disc, ada0, the failing Kigston, has the following partitioning:

~# gpart show ada0
=>       40  468862048  ada0  GPT  (224G)
         40       1024     1  freebsd-boot  (512K)
       1064        984        - free -  (492K)
       2048    4194304     2  freebsd-swap  (2.0G)
    4196352  464664576     3  freebsd-zfs  (222G)
  468860928       1160        - free -  (580K)

The new disc is ada1: <PNY 120GB SATA SSD V0218A0> ACS-2 ATA SATA 3.x device, it’s smaller, so I didn’t copy the partition scheme from ada0, instead:

~# gpart add -t freebsd-swap -s 2.0G -l swap1 ada1
~# gpart add -t freebsd-swap -s 2G -l swap1 ada1
~# gpart add -t freebsd-zfs -s 109G -l zfs1 ada1

Which gives us the following scheme:

~# gpart show ada1
=>       40  234441568  ada1  GPT  (112G)
         40       1024     1  freebsd-boot  (512K)
       1064    4194304     2  freebsd-swap  (2.0G)
    4195368  228589568     3  freebsd-zfs  (109G)
  232784936    1656672        - free -  (809M)

In order to be able to boot, we must populate the first partition, freebsd-boot with a bootloader, this is done by this well known command:

~# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

And now the real thing, first let’s create a new ZFS pool that has a different name from the already living one:

~# zpool create newpool gpt/zfs1

From there you can check your pool using the zpool list command.

Create a snapshot of the pool (here zroot) you want to migrate:

~# zfs snapshot -r zroot@backup

And transfer it to the new pool

~# zfs send -vR zroot@backup | zfs receive -vFd newpool

Make the new pool bootable

~# zpool set bootfs=newpool/ROOT/default newpool

And declare the boot environment mount point

~# zfs set mountpoint=/ newpool/ROOT/default

I didn’t touch at the zpool.cache file because I was told it was probably not necessary these days, but then after rebooting on the new drive, I saw this message while booting:

ZFS WARNING: unable to attach to ada0p3
cannot import 'zroot'

This occurs because the zpool.cache file still had a reference to the previous pool, fortunately this is easily fixed by rebuilding the file, like explained in this blog post.

~# zpool set cachefile=/etc/zfs/zpool.cache

And there we go, migration complete and a couple of concept better understood, yet lots of mysteries under the hood.