Messing around with FreeBSD 10 lately I loved ZFS, its simplicity and expand-ability. So I thought, what about a Linux with ZFS ? I’ll write about How to Setup ZFS Filesystem on Linux with zpool Command Examples in here.
ZFS has combined volume manager and filesystem with several advanced features.
This is the first part in a series of articles on ZFS.
In the article, we’ll provide an high level introduction to ZFS, explain how to install ZFS on linux, create a ZFS pool, and several ZFS zpool commands.
1. Introduction to ZFS
The following are some of the features of ZFS filesystem:
- Protection against data corruption
- Support for high storage capacities
- Efficient data compression
- Take Snapshots of filesystem
- Copy-on-write clones
- RAID Z Support
- Integrity checking
- Automatic repair and support for native NFSV4 ACL
This was developed originally by Sun Microsystems for the Solaris platform. In 2010, Oracle acquired Sun microsystems and has made lot of improvements on ZFS filesystem.
ZFS is recently becoming popular on Linux as it has become more stable.
The ZFS on Linux port is produced by the Lawrence Livermore National Laboratory (LLNL).
ZFS on Linux is a kernel module that you can download, compile and install. You do not have to patch or recompile your kernel.
You can download the source packages for your respective OS distribution from here.
2. Install ZFS on Linux
In this article, we’ll be installing ZFS on CentOS server. But, the zfs commands mentioned below are same for almost all the distributions on Linux distros except the installation part.
Execute the following yum commands to install ZFS on Redhat / CentOS.
# yum localinstall --nogpgcheck https://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm # yum localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el6.noarch.rpm # yum install kernel-devel zfs
Please ensure all the dependencies are met. One of the dependencies where the installation normally fails, would be a requirement to install GCC compiler. In this case, please install the GCC compiler before installing ZFS.
If you are not sure which packages you need you may also install the whole group “Development Tools” using yum groupinstall “Development Tools”
Ensure that the ZFS modules are loaded us lsmod command as shown below:
# lsmod | grep zfs zfs 1188621 0 zcommon 45591 1 zfs znvpair 81046 2 zfs,zcommon zavl 6900 1 zfs zunicode 323051 1 zfs spl 264548 5 zfs,zcommon,znvpair,zavl,zunicode
We have added few disks on this server (/dev/sdb through /dev/sdf) to test the ZFS functionality.
# ls -l /dev/sd* brw-rw----. 1 root disk 8, 0 Jul 15 15:52 /dev/sda brw-rw----. 1 root disk 8, 1 Jul 15 15:52 /dev/sda1 brw-rw----. 1 root disk 8, 2 Jul 15 15:52 /dev/sda2 brw-rw----. 1 root disk 8, 3 Jul 15 15:52 /dev/sda3 brw-rw----. 1 root disk 8, 16 Jul 16 10:57 /dev/sdb brw-rw----. 1 root disk 8, 32 Jul 16 10:57 /dev/sdc brw-rw----. 1 root disk 8, 48 Jul 16 10:58 /dev/sdd brw-rw----. 1 root disk 8, 64 Jul 16 11:27 /dev/sde brw-rw----. 1 root disk 8, 80 Jul 16 11:27 /dev/sdf
3. Create a zpool
Zpool command used to configure the storage pools in ZFS. Storage pool is a collection of devices that provides physical storage and data replication for zfs datasets.
The following creates a zpool.
# zpool create -f mypool raidz sdb sdc sdd sde sdf
In the above example:
- create stands for creating a new pool.
- The -f option is to ignore disk partition labels since these are new disks
- raidz is raid level. RAIDZ is nothing but the variation of RAID-5 that allows for better distribute on of parity and eliminates the “RAID-5” write hole (data and parity inconsistency after a power loss).
- A raidz group can have single, double or tribe parity meaning it can sustain one, two, or three failures respectively without losing any data. Data and parity is striped across all disks within a raidz group.
Next, verify the status of the zpool that we just created.
# zpool status pool: mypool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sdb ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0 errors: No known data errors
Once the pool is created, if you do df –h, you will see the newly created pool is mounted automatically on the mountpount.
# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vglocal-rootlv 14G 2.4G 11G 18% / tmpfs 939M 0 939M 0% /dev/shm /dev/sda1 504M 46M 433M 10% /boot mypool 3.9G 0 3.9G 0% /mypool
4. Create a Mirrored Pool
To create a mirrored pool, uze the zpool create command with the following options.
If any of the disk in the particular mirror group is failed, then the other disk still holds the data. As soon as the failed disk is replaced the contents are mirrored back(also known as resilvering) to the newly replaced disk.
# zpool create -f mypool mirror sdb sdc mirror sdd sde
Next, verify the status of the mirrored zpool that we just created:
# zpool status -v pool: mypool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sdb ONLINE 0 0 0 sdc ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 sdd ONLINE 0 0 0 sde ONLINE 0 0 0 errors: No known data errors
5. Zpool Import and Export
There are some cases when you may need to migrate a zfs pools between systems.
ZFS makes this possible by exporting a pool from one system and importing it to another system.
To export any pool, use the zpool export command and zpool import command is used to import the pool as shown in the following example:
# zpool export mypool # zpool import mypool
6. View I/O stats of the ZFS Pool
To view the zpool I/O statistics, use the zpool iostat command as shown below:
# zpool iostat -v mypool capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- mypool 147K 4.95G 0 0 33 252 mirror 54K 3.97G 0 0 10 84 sdb - - 0 0 536 612 sdc - - 0 0 282 612 mirror 93K 1008M 0 0 23 168 sdd - - 0 0 288 696 sde - - 0 0 294 696 ---------- ----- ----- ----- ----- ----- -----
7. Delete a ZFS pool
To destroy a pool, use the zpool destroy command as shown below:
# zpool destroy mypool
8. Replace Corrupted disk in ZFS pool
To replace a disk, after a failure or corruption, use the following command:
# zpool replace mypool sde sdf
9. Expand ZFS Pool with new Disk
To expand the zpool by adding a new disk use the zpool command as given below:
# zpool add -f mypool sde
10. Add a Spare Disk to ZFS Pool
You can also add a spare disk to the zfs pool using the below command, by adding a spare device to a zfs pool.
The failed disks is automatically replaced by the spare device and administrator can replace the failed disks at later time.
Please note that you can also share the spare device among multiple ZFS pools.
# zpool add -f mypool spare sde
11. Check your pool for errors
Checking ZFS File System Integrity
No fsck utility equivalent exists for ZFS. This utility has traditionally served two purposes, those of file system repair and file system validation.
File System Repair
With traditional file systems, the way in which data is written is inherently vulnerable to unexpected failure causing file system inconsistencies. Because a traditional file system is not transactional, unreferenced blocks, bad link counts, or other inconsistent file system structures are possible. The addition of journaling does solve some of these problems, but can introduce additional problems when the log cannot be rolled back. The only way for inconsistent data to exist on disk in a ZFS configuration is through hardware failure (in which case the pool should have been redundant) or when a bug exists in the ZFS software.
File System Validation
In addition to performing file system repair, the fsck utility validates that the data on disk has no problems. Traditionally, this task requires unmounting the file system and running the fsck utility, possibly taking the system to single-user mode in the process. This scenario results in downtime that is proportional to the size of the file system being checked. Instead of requiring an explicit utility to perform the necessary checking, ZFS provides a mechanism to perform routine checking of all inconsistencies. This feature, known as scrubbing, is commonly used in memory and other systems as a method of detecting and preventing errors before they result in a hardware or software failure.
Controlling ZFS Data Scrubbing
Whenever ZFS encounters an error, either through scrubbing or when accessing a file on demand, the error is logged internally so that you can obtain quick overview of all known errors within the pool.
Explicit ZFS Data Scrubbing
The simplest way to check data integrity is to initiate an explicit scrubbing of all data within the pool. This operation traverses all the data in the pool once and verifies that all blocks can be read. Scrubbing proceeds as fast as the devices allow, though the priority of any I/O remains below that of normal operations. This operation might negatively impact performance, though the pool’s data should remain usable and nearly as responsive while the scrubbing occurs. To initiate an explicit scrub, use the zpool scrub command.
#zpool scrub tank
The status of the current scrubbing operation can be displayed by using the zpool status command. For example:
#zpool status -v tank pool: tank state: ONLINE scrub: scrub completed after 0h7m with 0 errors on Tue Tue Feb 2 12:54:00 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 errors: No known data errors
Only one active scrubbing operation per pool can occur at one time.
You can stop a scrubbing operation that is in progress by using the -s option. For example:
#zpool scrub -s tank
In most cases, a scrubing operation to ensure data integrity should continue to completion. Stop a scrubbing operation at your own discretion if system performance is impacted by the operation.
Performing routine scrubbing guarantees continuous I/O to all disks on the system. Routine scrubbing has the side effect of preventing power management from placing idle disks in low-power mode. If the system is generally performing I/O all the time, or if power consumption is not a concern, then this issue can safely be ignored.