Live Migration of OpenZFS datasets with Zmotion
930,000customers)worldwide
1100employees
18'Mhosted applications
3,5'Mdomain names
Key Numbers
32 PoP(Point'of'Presence)NETWORK 220,000
servers
4Tbpsbandwidth
Western'Europe North'America Central'Europe
17/DATACENTERS
ZFS @ OVH• Every flavors of ZFS. Mostly OpenZFS with Illumos and
ZoL
• Various I/O workloads : virtualization, web, emails, databases, logs, backups…
• Designed our own dual headed ZFS HA server
• Intensive usage of custom ZFS properties to store configuration items
• Atypical usage like OpenZFS over/under CePH
• Fragmented storage farms because of continuous allocations/deallocations of zpools
• Heavily fragmented and aged zpools
• Hardware issues or limits
• Switch to new stack (ZFS/OS)
• ZFS bugs (it happens)
Why do we migrate?
ZFS SRC Filer
ZFS DST Filer
WEBFRONTENDS
Let’s migrate!
OVH BACKBONE
NFS
INTERNET
1st try « naive » approach
ZFS send/receive
Snapshot source dataset
Create destination
datasetUp NFS
service IP
Shutdown NFS
service IP
ZFSincremental send/receive
TimelineDOWNTIME
DOWNTIME
SRC
DST
All NFS clients crashed on
frontends. Need to umount/
remount NFS.
Why NFS clients crashed?
• « stale NFS file handle » error
• NFS ID (fh3_fsid) exposed from server has changed after service ip migration
ZFS SRC Filer
ZFS DST Filer
NFS
WEB FRONTENDS
NFS
NFS ID (fh3_fsid) :0x23456789
NFS ID (fh3_fsid) :0x98765432
NFS (fh3_fsid)
VFS (fsid)
ZFS (fsid_guid)
Kernel
File System
K3_fsid (NFS) -‐> fsid (VFS) -‐> fsid_guid (ZFS)
2nd try « skeet-‐shoot » fsid_guid in RAM
Get source fsid_guid with Dtracefbt::zfs_ioc_dataset_list_next:entry { self->zfs_cmd = args[0]; }fbt::dsl_dataset_fast_stat:entry /self->zfs_cmd != NULL/ { printf("zc_name:-%s- guid:-%#lx-\n", stringof(self->zfs_cmd->zc_name), args[0]->ds_fsid_guid); }fbt::zfs_ioc_dataset_list_next:return /self->zfs_cmd != NULL/ { self->zfs_cmd = NULL; }
# zc_name:-foo/t- guid:-0x48a20330327752-
Set source fsid_guid with MDB
# Get memory address where fsid_guid is stored fbt::dsl_dataset_sync:entry { printf("fsid_guid:-%a- address:-%a-\n", args[0]->ds_fsid_guid, &args[0]->ds_fsid_guid); }# fsid_guid:-2360eb4e33dbe— address:-0xffffff025a70e620-
# mdb -kwLoading modules: [ unix genunix dtrace zfs nfs … ] > 0xffffff025a70e620/J0xffffff025a70e620: 2360eb4e33dbe > 0xffffff025a70e620/Z 48a20330327752 0xffffff025a70e620: 0x2360eb4e33dbe = 0x48a20330327752
ZFS send/receive
Snapshot source dataset
Create destination
dataset
Up NFS service IPon dest
Shutdown NFS
service IPon src
ZFSincremental send/receive
TimelineDOWNTIME
DOWNTIME
Set destination fsid_guid in
RAM with MDB + zfs umount/
mount to update VFS ID
Get source fsid_guid
with Dtrace
SRC
DST
3rd try’s the charm « YAZP! » (Yet An Other ZFS Property)
fsid_guid new ZFS property
root@src_server# zfs get fsid_guid foo/t NAME PROPERTY VALUE SOURCE foo/t fsid_guid 25231704771932250 -
root@dst_server# zfs create -o fsid_guid=25231704771932250 foo/t
mbuffered ZFS send/receive
Snapshot source dataset
zfs create -o fsid_guid destination
dataset
Up NFS service IPon dest
Shutdown NFS
service IP on src
Zmo[onzfs get
fsid_guid source dataset
incremental mbuffered
ZFSsend/receive
… nSRC
DST
Tiny hack but HUGE benefits!
• Zmotion is a combination of fsid_guid patch and zfs send/receive orchestration
• Thousands of datasets already Zmotioned
• Make ZFS a bit more « distributable »
• ZoL not concerned : nfsid parameter presents in Linux NFS stack
Availability
• fsid_guid ZFS property patch
• illumos gate #6333
/ovh
/6333
Ques[ons?
Thank you!
OVH Storage Team Francois Lesage @storagebits