Efficient Data Replication with ZFS

This year, I have been working with one of our clients on a typical research-oriented server setup, including a few compute servers mounting a single shared storage over NFS, which is a common and well-tested configuration. The main difference for this project was the size of the storage. At the time when our team became engaged with this client, they were using ten assorted storage servers based on Linux and FreeNAS. In order to replicate the data between these servers, rsync was being used. Additionally, an elaborate scheme was in place to make sure that each dataset is housed on at least two different servers. All of the storage servers were outdated and out of warranty, so the client agreed to procure new hardware and build a new setup from scratch.

Following the example of Research Computing team, Ubuntu was selected as a base operating system for both compute and storage servers deployed on commodity Supermicro hardware resold by Colfax. Cost-effectiveness of the deployment was deemed as a decisive factor by the client. To achieve the maximum storage density, a client opted for a single 60-drive primary storage server with ZFS file system. ZFS brings with it all the advantages of a copy-on-write file system with features, such as instant copy, snapshots, flexible volume management, built-in NFS sharing, error resilience and correction.

Below I am going to discuss SEND/RECEIVE feature of ZFS, which allows to easily and efficiently replicate large volumes of data.

Continue reading “Efficient Data Replication with ZFS”