How to Setup ZFS Filesystem on Linux with zpool Command Examples

Messing around with FreeBSD 10 lately I loved ZFS, its simplicity and expand-ability. So I thought, what about a Linux with ZFS ? I’ll write about How to Setup ZFS Filesystem on Linux with zpool Command Examples in here.

 

ZFS has combined volume manager and filesystem with several advanced features.

This is the first part in a series of articles on ZFS.

In the article, we’ll provide an high level introduction to ZFS, explain how to install ZFS on linux, create a ZFS pool, and several ZFS zpool commands.

1. Introduction to ZFS

The following are some of the features of ZFS filesystem:

  • Protection against data corruption
  • Support for high storage capacities
  • Efficient data compression
  • Take Snapshots of filesystem
  • Copy-on-write clones
  • RAID Z Support
  • Integrity checking
  • Automatic repair and support for native NFSV4 ACL

This was developed originally by Sun Microsystems for the Solaris platform. In 2010, Oracle acquired Sun microsystems and has made lot of improvements on ZFS filesystem.

ZFS is recently becoming popular on Linux as it has become more stable.

The ZFS on Linux port is produced by the Lawrence Livermore National Laboratory (LLNL).

 

ZFS on Linux is a kernel module that you can download, compile and install. You do not have to patch or recompile your kernel.

You can download the source packages for your respective OS distribution from here.

2. Install ZFS on Linux

In this article, we’ll be installing ZFS on CentOS server. But, the zfs commands mentioned below are same for almost all the distributions on Linux distros except the installation part.

Execute the following yum commands to install ZFS on Redhat / CentOS.

# yum localinstall --nogpgcheck https://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

# yum localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el6.noarch.rpm

# yum install kernel-devel zfs

Please ensure all the dependencies are met. One of the dependencies where the installation normally fails, would be a requirement to install GCC compiler. In this case, please install the GCC compiler before installing ZFS.

 

If you are not sure which packages you need you may also install the whole group “Development Tools” using yum groupinstall “Development Tools”

 

Ensure that the ZFS modules are loaded us lsmod command as shown below:

# lsmod | grep zfs
zfs                  1188621  0
zcommon                45591  1 zfs
znvpair                81046  2 zfs,zcommon
zavl                    6900  1 zfs
zunicode              323051  1 zfs
spl                   264548  5 zfs,zcommon,znvpair,zavl,zunicode

We have added few disks on this server (/dev/sdb through /dev/sdf) to test the ZFS functionality.

# ls -l /dev/sd*
brw-rw----. 1 root disk 8,  0 Jul 15 15:52 /dev/sda
brw-rw----. 1 root disk 8,  1 Jul 15 15:52 /dev/sda1
brw-rw----. 1 root disk 8,  2 Jul 15 15:52 /dev/sda2
brw-rw----. 1 root disk 8,  3 Jul 15 15:52 /dev/sda3
brw-rw----. 1 root disk 8, 16 Jul 16 10:57 /dev/sdb
brw-rw----. 1 root disk 8, 32 Jul 16 10:57 /dev/sdc
brw-rw----. 1 root disk 8, 48 Jul 16 10:58 /dev/sdd
brw-rw----. 1 root disk 8, 64 Jul 16 11:27 /dev/sde
brw-rw----. 1 root disk 8, 80 Jul 16 11:27 /dev/sdf

3. Create a zpool

Zpool command used to configure the storage pools in ZFS. Storage pool is a collection of devices that provides physical storage and data replication for zfs datasets.

The following creates a zpool.

# zpool create -f mypool raidz sdb sdc sdd sde sdf

In the above example:

  • create stands for creating a new pool.
  • The -f option is to ignore disk partition labels since these are new disks
  • raidz is raid level. RAIDZ is nothing but the variation of RAID-5 that allows for better distribute on of parity and eliminates the “RAID-5” write hole (data and parity inconsistency after a power loss).
  • A raidz group can have single, double or tribe parity meaning it can sustain one, two, or three failures respectively without losing any data. Data and parity is striped across all disks within a raidz group.

Next, verify the status of the zpool that we just created.

# zpool status
  pool: mypool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        mypool      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0

errors: No known data errors

Once the pool is created, if you do df –h, you will see the newly created pool is mounted automatically on the mountpount.

# df -h
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/vglocal-rootlv   14G  2.4G   11G  18% /
tmpfs                       939M     0  939M   0% /dev/shm
/dev/sda1                   504M   46M  433M  10% /boot
mypool                      3.9G     0  3.9G   0% /mypool

4. Create a Mirrored Pool

To create a mirrored pool, uze the zpool create command with the following options.

If any of the disk in the particular mirror group is failed, then the other disk still holds the data. As soon as the failed disk is replaced the contents are mirrored back(also known as resilvering) to the newly replaced disk.

# zpool create -f mypool mirror sdb sdc mirror sdd sde

Next, verify the status of the mirrored zpool that we just created:

# zpool status -v
  pool: mypool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        mypool      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0

errors: No known data errors

5. Zpool Import and Export

There are some cases when you may need to migrate a zfs pools between systems.

ZFS makes this possible by exporting a pool from one system and importing it to another system.

To export any pool, use the zpool export command and zpool import command is used to import the pool as shown in the following example:

# zpool export mypool

# zpool import mypool

6. View I/O stats of the ZFS Pool

To view the zpool I/O statistics, use the zpool iostat command as shown below:

# zpool iostat -v mypool
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
mypool       147K  4.95G      0      0     33    252
  mirror      54K  3.97G      0      0     10     84
    sdb         -      -      0      0    536    612
    sdc         -      -      0      0    282    612
  mirror      93K  1008M      0      0     23    168
    sdd         -      -      0      0    288    696
    sde         -      -      0      0    294    696
----------  -----  -----  -----  -----  -----  -----

7. Delete a ZFS pool

To destroy a pool, use the zpool destroy command as shown below:

# zpool destroy mypool

8. Replace Corrupted disk in ZFS pool

To replace a disk, after a failure or corruption, use the following command:

# zpool replace mypool sde sdf

9. Expand ZFS Pool with new Disk

To expand the zpool by adding a new disk use the zpool command as given below:

# zpool add -f mypool sde

10. Add a Spare Disk to ZFS Pool

You can also add a spare disk to the zfs pool using the below command, by adding a spare device to a zfs pool.

The failed disks is automatically replaced by the spare device and administrator can replace the failed disks at later time.

Please note that you can also share the spare device among multiple ZFS pools.

# zpool add -f mypool spare sde

 

11. Check your pool for errors

Checking ZFS File System Integrity

No fsck utility equivalent exists for ZFS. This utility has traditionally served two purposes, those of file system repair and file system validation.

File System Repair

With traditional file systems, the way in which data is written is inherently vulnerable to unexpected failure causing file system inconsistencies. Because a traditional file system is not transactional, unreferenced blocks, bad link counts, or other inconsistent file system structures are possible. The addition of journaling does solve some of these problems, but can introduce additional problems when the log cannot be rolled back. The only way for inconsistent data to exist on disk in a ZFS configuration is through hardware failure (in which case the pool should have been redundant) or when a bug exists in the ZFS software.

File System Validation

In addition to performing file system repair, the fsck utility validates that the data on disk has no problems. Traditionally, this task requires unmounting the file system and running the fsck utility, possibly taking the system to single-user mode in the process. This scenario results in downtime that is proportional to the size of the file system being checked. Instead of requiring an explicit utility to perform the necessary checking, ZFS provides a mechanism to perform routine checking of all inconsistencies. This feature, known as scrubbing, is commonly used in memory and other systems as a method of detecting and preventing errors before they result in a hardware or software failure.

Controlling ZFS Data Scrubbing

Whenever ZFS encounters an error, either through scrubbing or when accessing a file on demand, the error is logged internally so that you can obtain quick overview of all known errors within the pool.

Explicit ZFS Data Scrubbing

The simplest way to check data integrity is to initiate an explicit scrubbing of all data within the pool. This operation traverses all the data in the pool once and verifies that all blocks can be read. Scrubbing proceeds as fast as the devices allow, though the priority of any I/O remains below that of normal operations. This operation might negatively impact performance, though the pool’s data should remain usable and nearly as responsive while the scrubbing occurs. To initiate an explicit scrub, use the zpool scrub command.

#zpool scrub tank

The status of the current scrubbing operation can be displayed by using the zpool status command. For example:

#zpool status -v tank
  pool: tank
 state: ONLINE
 scrub: scrub completed after 0h7m with 0 errors on Tue Tue Feb  2 12:54:00 2010
config:
        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0

errors: No known data errors

Only one active scrubbing operation per pool can occur at one time.

You can stop a scrubbing operation that is in progress by using the -s option. For example:

#zpool scrub -s tank

In most cases, a scrubing operation to ensure data integrity should continue to completion. Stop a scrubbing operation at your own discretion if system performance is impacted by the operation.

Performing routine scrubbing guarantees continuous I/O to all disks on the system. Routine scrubbing has the side effect of preventing power management from placing idle disks in low-power mode. If the system is generally performing I/O all the time, or if power consumption is not a concern, then this issue can safely be ignored.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.