THIS ARTICLE IS A WORK IN PROGRESS.
You may consult it for ideas, but I strongly suggest you do not blindly follow it yet. Some of the scripts are not even the newest of their kind, I am searching my archives for their latest versions.
As my other setup options, the one described here is an "advanced" variant which may be cumbersome to set up, has its benefits and maybe drawbacks, and is not "required" for any and all usage scenarios (just for some).
This article describes two related setups, the first one may be used by itself, and the second one builds on the first:
Please keep in mind that this page and example services are at this time posted from my notes and home-brewn hacks. In particular, the SMF integration can be remade in a prettier manner (i.e. each pool managed by a separate instance, etc.) I publish these "as is", and they will likely need customization for your systems. Maybe I (or other contributors) would later remake these pieces into a fine generalized redistributable work of art
There may be several reasons to pursue such a setup. I had a few to make it, in the first place:
zdbthat the "Deferred delete" list of blocks was getting shorter every time. And by having the pool in a service (with a timeout and a "catapult button", as seen below), I could disable the import attempt of this pool during a particular boot – if I needed to use the machine rather than have it loop to clean up that pool. Took a couple of weeks, overall...
rdskdevice associated with the
zvol, but I was told that a more generic solution is to use iSCSI and make a loopback sort of mount. However, with default setup this creates a deadlock: networking starts after the filesystems, and is needed to import the iSCSI-served pool.
dedupenabled for some of my experiments.
Overall, one can point out several typical variants:
Some of the steps below assume the following environment variables:
### Set the names for "physical" (or main for data) and "virtual" (stored in zvol on physical) pools :; POOL_PHYS=pool :; POOL_VIRT=dcpool
Again, note that at this moment the sample scripts and instructions are from a custom installation at my rig. Generalization may follow-up later. So far a curious reader would have to adapt the instructions.
Phase 1: ZFS pool as an SMF service
So, the first phase is rather trivial...
:; wget -O /lib/svc/method/mount-main-pool \ http://wiki.openindiana.org/download/attachments/27230301/mount-main-pool :; mkdir -p /var/svc/manifest/network/iscsi :; wget -O /var/svc/manifest/network/iscsi/mount-main-pool.xml \ http://wiki.openindiana.org/download/attachments/27230301/mount-main-pool.xml
Don't mind the "iscsi" part of the naming – this is historical due to the second phase of this setup.
Edit the method script. This file, as it is now, is tuned for my installation, and too much is hardcoded.
Script logic: the main data pool named
pool contains a
/pool/tmp directory (or automountable child dataset). The method script verifies that this directory exists; if not – the pool can be imported on start (waits for listing and status to complete, and logs the results; then mounts all ZFS filesystems (note – not only from this pool), and only then does the method script complete), if yes – it can be exported (loops until success) on stop.
In order to protect the tested directory from bogusly appearing on the root filesystem (of the
rpool) you can use an immutable mountpoint (detailed below).
The script includes several anti-import precautions: except for disablement of the service (as it depends on non-existance of the file
/etc/zfs/noimport-pool), a delay-file
/etc/zfs/delay-pool which can contain the timeout (in seconds) or just exist (defaults to 600 sec), and an automatic lock-file to prevent subsequent imports of pools that can not complete and hang or crash your system.
Also note that here the import is done without a cachefile and with an alternate root (even if
/ by default). For larger pools made of many vdevs, you can speed up the imports by using an alternate
cachefile=/etc/zfs/zpool-main.cache or something like that, just not the default one.
You can also
touch /etc/zfs/noautomount-$POOL in order to avoid auto-mounting of filesystem datasets (
zfs mount -a) at the end of the routine; the pool is initially imported without automounting anything at all.
You might want to add different options and/or logic at your taste.
TODO: Replace hardcoding with config-file and/or SMF attribute modifiable configuration.
Revise the manifest file. It currently sets a dependency on
filesystem/local; you might want something else (such as
svc:/network/ssh:default) so that you can have a while to disable the pool-importing service. If revising dependencies, make sure to avoid loops (SMF commands should help here).
Also the service depends on the absence of lock-files
/etc/zfs/.autolock.pool (created and removed by the method script around import attempts) and
/etc/zfs/noimport-pool (maintained by the user to optionally disable auto-import); the
pool part in these filenames (or rather the complete filenames, as synthesized by default) should match what is defined for the service in the method script.
It also defines
zones as dependent services so that these resources hosted on the data
pool are only started when it is mounted; you might also want to add
nfs/server, or set them to
optional_all type of dependency, if your
rpool also hosts some zones and/or files and can do so without a present data pool.
Install the SMF wrapping scripts:
:; svccfg import /var/svc/manifest/network/iscsi/mount-main-pool.xml
This creates the (disabled) service for main-pool importing, which calls the script above as the method script.
Remove the pool in question from auto-import database by exporting it (NOTE: this unshares and unmounts all its filesystems in the process, will block or fail on any active users, over-mounted filesystem nodes, used volumes, etc.):
:; POOL=$POOL_PHYS :; zpool export $POOL
As a result, this pool should no longer be cached in
/etc/zfs/zpool.cache for faster and automated imports at OS startup.
Protect the mountpoint from method script's test failures:
:; df -k /$POOL :; ls -la /$POOL ### Make sure that the pool is exported and its mountpoint directory does not exist or is empty :; mkdir /$POOL :; /bin/chmod S+ci /$POOL
The immutable mountpoint can not be written even by
root, such as when an untimely
zfs mount would try to create subdirectories without mounting the pool's root dataset first and break our setup.
Enable the service, which should mount your pool, you can monitor the progress and ultimately the pool status in the service log:
### Not done before, so you have time to revise the steps instead of blind copy-pasting ;) :; chmod +x /lib/svc/method/mount-main-pool ### Temp-enable while we are testing :; svcadm enable -t mount-main-pool :; tail -f /var/svc/log/*mount-main-pool*log ### If all is ok, you may want to enable the service to start at boot... or maybe not. :; mkdir /$POOL/tmp :; svcadm enable mount-main-pool
At this point, your data pool is imported not blindly by the OS, but by your service. Which you can disable in case of problems (at the moment this may require to boot into a livecd to create the block-file
/etc/zfs/noimport-pool which would cancel the service startup, if for some reason an automatic creation and clearing of a block-file around the start/stop calls does not help).
Perhaps more importantly, you now have the main pool wrapped as an SMF resource on which other services can depend (or not depend) for orderly startup. If this pool takes very long to import and/or can fail in the process, it does not delay the startup of other services (like
ssh), and you can monitor the state of the import as an SMF status with
svcs or numerous remote-management tools.
Phase 2: iSCSI target (server)
Here we set up the zvol and share over iSCSI which would store "virtual" ZFS pool, named below
dcpool for historical reasons (it was deduped inside and compressed outside on my test rig, so I hoped to compress only the unique data written).
TODO: find my notes on setup of the server – unfortunately, the HomeNAS itself is not currently available to look at... but there was (IIRC) not much different from usual COMSTAR iSCSI (with
stmf). Enable services, create a backing store for a LU implemented as a zvol, allow localhost and/or remote hosts to access it.
One possible caveat is that the iSCSI server services should be made dependent on the
main-pool-import service created above (assuming that it holds the
zvol). If there are several physical pools, and others serve iSCSI too – a separate instance (or full service made as a replica) of
iscsi/target may be in order, to wrap just the creation/teardown and sharing/unsharing of target(s) on the SMFized pool – see iscsi-lun-dcpool for an example (NOTE: hardcoded values would need adaptation for your systems).
Phase 3: iSCSI initiator (client)
This phase involves wrapping of the iSCSI-mounted pool as an SMF service. Indeed, some readers who simply use remote iSCSI pools, might start reading the article here
First, there is the initiator part: the networking client. There is really no magic here, this service was needed just for peace of mind about not conflicting with system services (i.e. over OS upgrades) while I create specific dependency setups. It uses the same system logic as
iscsi/initiator for actual work. Possibly, just one needs to be enabled at a time.
:; wget -O /lib/svc/method/iscsi-initiator-dcpool \ http://wiki.openindiana.org/download/attachments/27230301/iscsi-initiator-dcpool :; wget -O /var/svc/manifest/network/iscsi/initiator-dcpool.xml \ http://wiki.openindiana.org/download/attachments/27230301/initiator-dcpool.xml
Revise the files. The script contains a callout to the system's standard
/lib/svc/method/iscsi-initiator wrapped with 10-second sleeps (after start and before stop).
The manifest declares a dependency on networking (at least
loopback) and on
On a system which serves the volume (real loopback-import) you'd also add a dependency on the iSCSI target service (or its instance or replica dedicated to serving this particular volume).
Install the service:
:; svccfg import /var/svc/manifest/network/iscsi/initiator-dcpool.xml
This creates a (disabled) instance of
Enable the service:
:; svcadm enable initiator-dcpool
This should allow the system to use iSCSI and find the defined (elsewhere) targets.
Second, prepare the mountpoint (also protected from modifications with immutability):
:; POOL=$POOL_VIRT :; df -k /$POOL :; ls -la /$POOL ### Make sure that the pool is exported and its mountpoint directory does not exist or is empty :; mkdir /$POOL :; /bin/chmod S+ci /$POOL
Third, set up the import of the pool over iSCSI (assuming that the COMSTAR initiator has been set up to query the needed target servers, and now
zpool knows where to find iSCSI-backed vdevs):
:; wget -O /lib/svc/method/iscsi-mount-dcpool \ http://wiki.openindiana.org/download/attachments/27230301/iscsi-mount-dcpool :; wget -O /var/svc/manifest/network/iscsi/mount-dcpool.xml \ http://wiki.openindiana.org/download/attachments/27230301/mount-dcpool.xml
Revise the files. The script is currently hardcoded to import or export a
dcpool, subject to absence of
/etc/zfs/noimport-dcpool block-file, and determines presence of an already-imported pool as presence of the
/dcpool/export directory. If the file
/etc/zfs/delay.dcpool exists and contains a number, the startup of the service is delayed by this number of seconds; if the file exists and is empty (or not a number), the delay is 600 seconds (10 minutes). The import bypasses the default cachefile, but otherwise is not tweaked. The export loops until successful.
The manifest declares dependencies on
iscsi/initiator-dcpool and on networking (
loopback here, you may want
physical for remote mounts), and a reverse-dependency on the block-file.
Install the service:
:; svccfg import /var/svc/manifest/network/iscsi/mount-dcpool.xml
This creates a (disabled) instance of
Enable the service:
:; svcadm enable mount-dcpool :; tail -f /var/svc/log/*mount-dcpool*log
Finally, I also had a watchdog service to actively monitor the viability of the "virtual" pool, as things tended to lock up once in a while, either due to internetworking faults or the experimental server problems. But that was too much of an in-house hack to publish at the moment (and relied on some currently proprietary status-testing methods).