One problem for the split-root setup (if you want to separate out the
/usr filesystem) is that OpenIndiana brings
/sbin/sh as a symlink to
../usr/bin/i86/ksh93. Absence of the system shell (due to not-yet-mounted
init to loop and fail early in OS boot.
When doing the split you must copy the
ksh93 binary and some libraries that it depends on from
/usr namespace into the root dataset (
/lib accordingly), and fix the
/sbin/sh symlink. The specific steps are detailed below, and may have to be repeated after system updates (in case the shell or libraries are updated in some incompatible fashion).
My earlier research-posts suggested replacement of
Another (rather cosmetic) issue is that many other programs are absent in the minimized root without
/usr, ranging from
svc* SMF-management commands,
vi and so on. I find it convenient to also copy
bash and some of the above commands from
/sbin, though this is not strictly required for system operation – it just makes repairs easier
A much more serious consequence of the absence of programs from
/usr is that some SMF method scripts which initialize the system up to the "single-user milestone", including implementations both
nwam implementations of
svc:/network/physical, rely on some programs from
/usr. The rationale is that network-booted miniroot images carry the needed files, and disk-based roots are expected to be "monolithic". It is possible to fix some of those methods (except NWAM in the default setup, at least), but a more reliable and less invasive solution is to mount the local ZFS components of the root filesystem hierarchy (and thus guarantee availability of proper
usr) before other methods are executed. This is detailed below as the
svc:/system/filesystem/root-zfs:default service with
fs-root-zfs script as its method.
NOTE for readers of earlier versions of the document: this script builds on my earlier customizations of the previously existing
filesystem methods; now the these legacy scripts don't need many modifications (I did add just the needed checks whether a filesystem has already been mounted).
/var/tmp into a shared dataset did not work for me, at least some time in the past – some past (before the new
fs-root-zfs service) – some existing services start before
filesystem/minimal completes (which mounts such datasets) and either the
/var/tmp dataset can not mount into a non-empty mountpoint, or (if
-O is used for overlay mount) some programs can't find the temporary files which they expect.
It is possible that with the introduction of
fs-root-zfs this would work correctly, but this is not thoroughly tested yet.
Likewise, separation of
/root home directory did not work well: in case of system repairs it might be not mounted at all and things get interesting
It may suffice to mount a sub-directory under
/root from a dataset in the shared hierarchy, and store larger files there, or just make an
rpool/export/home/root and symlink to it from under
/root (with the latter being individual to each BE).
Cloning BE's with
beadm currently does not replicate the original datasets' "local" ZFS attributes, such as
(ref)reservation. If you use
pkg image-update to create a new BE and update the OS image inside it, you're in for surprise: newly written data won't be compressed as you expected it to be – it will inherit compression settings from
rpool/ROOT (uncompressed or LZ4 are likely candidates). While fixing
beadm in this behaviour is a worthy RFE as well (issue numbers #4355 for
pkg and #3569 for
zfs), currently you should work around this by creating the new BE manually, re-applying the (compression) settings to the non-boot datasets (such as
/usr), mounting the new BE, and providing the mountpoint to
pkg commands. An example is detailed below.
Note that the bootable dataset (such as
rpool/ROOT/oi_151a8) must remain with the settings which are compatible with your GRUB's
bootfs support (uncompressed until recently, or with
lz4 since recently).
Finally, proper mounting of hierarchical roots requires modifications to some system SMF methods. Patches and complete scripts are provided along with this article, though I hope that one day they will be integrated into
illumos-gate or OI distribution (issue number #4352), and manual tweaks on individual systems will no longer be required.
fs-root script (earlier) or the replacement
fs-root-zfs script (later) introduces optional console logging (enable by touching
/.debug_mnt in the root of a BE), and enhances the case for ZFS-mounted root and
usr filesystems by making sure that the mountpoints of sub-datasets of the root filesystem are root-based and not something like
/a/usr (for all child datasets), and mounts
/usr with overlay mode (
zfs mount -O – this – takes care of the issue number #997 at least for the rootfs components) – too often have mischiefs like these two left an updated system unbootable and remotely inaccessible. It also verifies that the mounted filesystem is "sane" (a
/usr/bin directory exists), and with that in place – restarts (if
online) or clears (if in
maintenance state) the networking SMF services
svc:/network/iptun:default. The SMF method scripts for the latter rely on
/usr and these services are dependencies for the
filesystem/root (see issue number #4361). Doing the service restart after making sure
/usr is available seems like the "cleanest" and most effective solution.
fs-usr script deals with setup of
dump, and the patch is minor (verify that
dumpadm exists, in case sanity of
/usr was previously overestimated). For non-ZFS root filesystems in global zone, the script takes care of re-mounting the
/usr filesystems read-write according to
/etc/vfstab, and does some other tasks.
While the described patches (see fs-root-zfs.patch for the new solution, or reference fs-splitroot-fix.patch for the earlier solution) are not strictly required (i.e. things can work if you are super-careful about empty mountpoint directories and proper
mountpoint attribute values, and the system does not unexpectedly or by your mistake reboot while you are in mid-procedure, or if you use
legacy mountpoints and fix up
/etc/vfstab in each new BE), they do greatly increase the chances of successful and correct boot-ups in the general case with dynamically-used boot environments, shared datasets and occasional untimely reboots. Also, some networking initialization scripts (notably NWAM) do expect
/usr and maybe even
/var to be mounted before they run, and the existing
filesystem methods (which would mount
/usr) happen to depend on them, However,
physical:default does run successfully (most of the time, missing just the
cut command which can be replaced by a
ksh93 builtin implementation).
bootfs children or shared datasets to mount
There are several ways to specify which datasets should be mounted as part of the dedicated or shared split-root hierarchy. In the context of descriptions below, the "
bootfs children" are filesystem datasets contained within the root filesystem instance requested for current boot via GRUB (explicitly, or defaulting to the value of the ZFS pool's
"Legacy" filesystem datasets with
mountpoint=legacy which are explicitly specified in the
/etc/vfstab file inside this
bootfs. This allows to pass mount-time options (such as the overlay mount, before it was enforced by the fixed
rpool/ROOT/oi_151a8/usr - /usr zfs - no - rpool/SHARED/var/adm - /var/adm zfs - yes -
A drawback of this method for
bootfs children is that the file must be updated after each cloning or renaming of the boot environment to match the actual ZFS dataset full name.
mountpointpaths (and, for the new
canmountvalue other than "
off"), mounting happens automatically: for
/usras a step in
filesystem/rootservice, for others as a step in
canmount=noauto, because after BE cloning the
rpoolwould provide multiple datasets with the same mountpoints, causing errors (conflicts) of automatic mounts during pool imports.
canmount=offfor such datasets with un-fixed old service method implementations in place would log errors due to inability to
zfs mountsuch datasets; however, for datasets other than
/usr, the return codes are not checked, so this should not cause boot failures.
filesystem methods can use
/etc/vfstab to locate over a dozen paths for mounting (backed by any of the supported filesystem types), many of which are not used in the default installations. Those which might be used in practice with ZFS include
/tmp; these blocks in the method scipts also include logic to mount such child datasets of the current
bootfs if they exist and a corresponding path was not explicitly specified in
Extensions added by me into the fixed scripts (earlier solution) or provided as the new
fs-root-zfs method, allow to mount such paths (except
/var) also from a number of other locations as "shared" datasets – if they were not found as children of the current
For possibly "shared" datasets, other than the explicitly specified short list (above), the legacy
filesystem methods only offer the call to "
zfs mount -a" from
filesystem/local (way after the "single-user" milestone). This implies specified (non-"
mountpoint paths and
canmount=on; other datasets are not mounted automatically.
Extensions provided as the new
fs-root-zfs method allow to mount datasets with such attribute values from
$rpool/SHARED (where the
$rpool name is determined from the currently mounted root filesystem dataset). This ensures availability of active shared datasets as part of the split-root filesystem hierarchy early in boot. In particular, following the "auto-mounting" requirements allows to use datasets with a specified
mountpoint path and
canmount=off as "containers" for the shared datasets to inherit the parent container's path automatically (i.e. a non-mounting
Below you can find a screenshot with examples of the non-legacy datasets, both children of the root and shared ones. There is no example of a "legacy" dataset passed through
/etc/vfstab because I can't contrive a rational case where that would be useful today
The examples below assume that your currently installed and configured OS resides in
rpool/ROOT/openindiana and you want to relocate it into
rpool/ROOT/oi_151a8 with a hierarchy of compressed sub-datasets for system files (examples below use variables to allow easy upgrades of the procedure to different realities), and shared files like
crash dumps will reside in a hierarchy under
For the oi_151a8 release and several releases before it, the system-provided scripts did not change, so the full scripts can be the easier choice to download: fs-root-zfs, fs-root and fs-minimal. As described above, the
fs-root-zfs script includes all the logic needed to detect and mount the local ZFS-based root filesystem hierarchy (and skips any non-ZFS filesystems and mountpoints under them), and the existing method scripts are just slightly fixed to expect that the paths they try to manage may have already been mounted. Also, unlike the earlier existing scripts, the
fs-root-zfs script explicitly mounts the shared datasets (
$rpool/SHARED) early in the system initialization to ensure the complete root filesystem hierarchy to other methods, such as
network initialization scripts.
For other releases and distributions it may be worthwhile to get the patches as fs-root-zfs.patch and apply them.
It was recently discovered that NWAM network auto-configuration does not work with split-root config based on earlier modifications of
Tracing the system scripts has shown that a substantial part of them depends on availability of