Linux Administration: Deploying GlusterFS

GlusterFS is a powerful network/cluster filesystem running in user space that uses FUSE to connect with the VFS layer in the OS. GlusterFS is a File System itself that uses already tried and tested disk file systems like ext3, ext4, xfs, etc. to store the data. It can easily scale up to petabytes of storage which is available to user under a single mount point [1].

In this tutorial I'll walk you through installing and configuring a GlusterFS storage pool consisting of 4 servers. Each server will have one brick exported and the entire storage pool (device) will consist of those 4 bricks.

Terms.

brick - A locally attached filesystem (e.g. xfs on top of LVM) that is part of a volume.

client - The machine which mounts the volume.

server - The machine which hosts the filesystem in which data will be stored.

volume - A Network available filesystem which can be mounted using native GlusterFS, NFS, CIFS, etc.

Installing and starting GlusterFS.

1. Download the package for your distribution from http://download.gluster.org/pub/gluster/glusterfs/LATEST/
2. Install GlusterFS on all servers using the following commands:

On RHEL/CentOS:

On Debian/Ubuntu:

3. Start the GlusterFS daemon on all servers:

Preparing Bricks.

1. On each server create a LVM volume with XFS filesystem on top and mount it. It's important to mention that all mount points must be unique throughout the entire storage pool.

File: gistfile1.sh ------------------ [root@gnode1 ~]# pvcreate /dev/vdb Physical volume "/dev/vdb" successfully created [root@gnode1 ~]# vgcreate vg_node /dev/vdb Volume group "vg_node" successfully created [root@gnode1 ~]# lvcreate -n gnode1a -L 1g vg_node Logical volume "gnode1a" created [root@gnode1 ~]# mkfs.xfs -i size=512 /dev/vg_node/gnode1a meta-data=/dev/vg_node/gnode1a isize=512 agcount=4, agsize=65536 blks ...

Creating a trusted storage pool.

A trusted storage pool consists of the storage servers that will comprise the volume, in other words it is a trusted network of storage servers. When you start the first server, the storage pool consists of that server alone. To add additional storage servers to the storage pool, you can use the probe command from a storage server that is already trusted. The GlusterFS service must be running on all storage servers that you want to add to the storage pool.

In our case to create a trusted storage pool of four servers, add three servers to the storage pool from server gnode1:

1. Probe the servers you want to add to the storage pool.

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster peer status No peers present [root@gnode1 ~]# gluster peer probe gnode2 Probe successful [root@gnode1 ~]# gluster peer probe gnode3 Probe successful [root@gnode1 ~]# gluster peer probe gnode4 Probe successful

Note: Do not self-probe local host - gnode1.

Verify the peer status from the first server using the following commands:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster peer status Number of Peers: 3 Hostname: 192.168.1.92 Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5 State: Peer in Cluster (Connected) Hostname: 192.168.1.93 Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7 State: Pfde43e-4533-4e33-4f77-ed3984da21ae State: Peer in Cluster (Connected) ...

Now that we have a trusted storage pool consisting of 4 servers lets create the Volumes that we can actually use to store files later on.

Configuring GlusterFS Volumes.

There are three types of volumes:

Distributed - Distributes files throughout the cluster.

Replicated - Replicates data across two or more nodes in the cluster.

Striped - Stripes files across multiple nodes in the cluster.

I'll demonstrate how to setup and use all three of them in the following sections.

Configuring GlusterFS Distributed Volumes.

Distributed volumes distribute files throughout the cluster. You can use distributed volumes to scale storage in an archival environment in situations where small periods of down time is acceptable during disk swaps.

Keep in mind that a disk failure in distributed volumes can result in a serious loss of data since directory contents are spread randomly throughout the cluster.

To configure a distributed volume perform the following on only one server, in this case gnode1:

1. Create the volume using the following command:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster volume create gvola gnode1:/mnt/bricks/gnode1a gnode2:/mnt/bricks/gnode2a gnode3:/mnt/bricks/gnode3a gnode4:/mnt/bricks/gnode4a Creation of volume gvola has been successful. Please start the volume to access data.

You can optionally display the volume information using the following command:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster volume info gvola Volume Name: gvola Type: Distribute Status: Created Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: gnode1:/mnt/bricks/gnode1a Brick2: gnode2:/mnt/bricks/gnode2a Brick1: gnode3:/mnt/bricks/gnode3a Brick2: gnode4:/mnt/bricks/gnode4a

2. Start the volume using the following command:

Configuring GlusterFS Replicated Volumes.

Distributed replicated volumes replicate (mirror) data across two or more nodes in the cluster. You can use distributed replicated volumes in environments where high-availability and high-reliability are critical. Distributed replicated volumes also offer improved read performance in most environments.

To configure a four node replicated volume with a two-way mirror perform the following on only one server, in this case gnode1:

1. Create the volume using the following command:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster volume create gvola replica 2 transport tcp gnode1:/mnt/bricks/gnode1a gnode2:/mnt/bricks/gnode2a gnode3:/mnt/bricks/gnode3a gnode4:/mnt/bricks/gnode4a Creation of volume gvola has been successful. Please start the volume to access data.

2. Start the volume using the following command:

Configuring GlusterFS Striped Volumes.

Distributed striped volumes stripe data across two or more nodes in the cluster. For best results, you should use distributed striped volumes only in high concurrency environments accessing very large files.

To configure a four node striped volume perform the following on only one server, in this case gnode1:

1. Create the volume using the following command:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster volume create gvola stripe 4 transport tcp gnode1:/mnt/bricks/gnode1a gnode2:/mnt/bricks/gnode2a gnode3:/mnt/bricks/gnode3a gnode4:/mnt/bricks/gnode4a Creation of volume gvola has been successful. Please start the volume to access data.

2. Start the volume using the following command:

Using the Volumes.

Now that we have an available volume, comprising of four bricks on four servers, let's mount it on a client machine using native GlusterFS.

The Gluster Native Client is a FUSE-based client running in user space and it is the recommended method for accessing volumes if all the clustered features of GlusterFS has to be utilized.

On RedHat-based Distributions run:

File: gistfile1.sh ------------------ [root@client ~]# yum -y install fuse fuse-libs [root@client ~]# rpm -i glusterfs-3.3.0-1.x86_64.rpm [root@client ~]# rpm -i glusterfs-fuse-3.3.0-1.x86_64.rpm [root@client ~]# rpm -i glusterfs-rdma-3.3.0-1.x86_64.rpm [root@client ~]# modprobe fuse

On Debian-based Distributions run:

Ready to mount the volume:

It's worth mentioning that any one peer can be referenced as the mount source, in this case I chose gnode1.

We can also use NFS to export the volume from any server in the pool and then mount that export:

File: gistfile1.sh ------------------ [root@client ~]# showmount -e gnode2 Export list for gnode2: /gvola * [root@client ~]# mkdir -p /mnt/volumes/gvola_nfs [root@client ~]# mount -t nfs -o tcp,vers=3 gnode2:/gvola /mnt/volumes/gvola_nfs

Expanding Volumes

You can expand volumes, as needed, while the cluster is online and available. For example, you might want to add a brick to a distributed volume, thereby increasing the distribution and adding to the capacity of the GlusterFS volume.

Similarly, you might want to add a group of bricks to a distributed replicated volume, increasing the capacity of the GlusterFS volume.

When expanding distributed replicated and distributed striped volumes, you need to add a number of bricks that is a multiple of the replica or stripe count. For example, to expand a distributed replicated volume with a replica count of 2, you need to add bricks in multiples of 2 (such as 4, 6, 8, etc.).

To expand a volume perform the following:

1. On the first server in the cluster, probe the server to which you want to add the new brick using the following command:

2. Add the brick using the following command:

3. Check the volume information using the following command:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster volume info Volume Name: gvola Type: Distribute Status: Created Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: gnode1:/mnt/bricks/gnode1a Brick2: gnode2:/mnt/bricks/gnode2a Brick3: gnode3:/mnt/bricks/gnode3a Brick4: gnode4:/mnt/bricks/gnode4a Brick5: gnode4:/mnt/bricks/gnode4b

4. Re-balance the volume to ensure that all files are visible at the mount point:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster volume rebalance gvola start Starting defrag on volume gvola has been successful [root@gnode1 ~]# gluster volume rebalance gvola status Rebalance in progress: rebalanced 412 files of size 322046 (total files scanned 643)

Shrinking Volumes

You can shrink volumes, as needed, while the cluster is online and available. For example, you might need to remove a brick that has become inaccessible in a distributed volume due to hardware or network failure.

Data residing on the brick that you are removing will no longer be accessible at the Gluster mount point. Note however that only the configuration information is removed - you can continue to access the data directly from the brick, as necessary.

To shrink a volume perform the following:

1. Remove the brick using the following command:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster volume remove-brick gvola gnode4:/mnt/bricks/gnode4b Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y Remove Brick successful

2. Check the volume information using the following command:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster volume info Volume Name: gvola Type: Distribute Status: Created Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: gnode1:/mnt/bricks/gnode1a Brick2: gnode2:/mnt/bricks/gnode2a Brick3: gnode3:/mnt/bricks/gnode3a Brick4: gnode4:/mnt/bricks/gnode4a

3. Re-balance the volume to ensure that all files are visible at the mount point.

Migrating Volumes

You can migrate the data from one brick to another, as needed, while the cluster is online and available.

To migrate the data in gnode3:/mnt/bricks/gnode3a to gnode4:/mnt/bricks/gnode4a in gvola:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster volume replace-brick gvola gnode3:/mnt/bricks/gnode3a gnode4:/mnt/bricks/gnode4a start Replace brick start operation successful

Note: You need to have the FUSE package installed on the server on which you are running the replace-brick command for the command to work.

To pause the data migration run:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster volume replace-brick gvola gnode3:/mnt/bricks/gnode3a gnode4:/mnt/bricks/gnode4a pause Replace brick pause operation successful

To abort the data migration run:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster volume replace-brick gvola gnode3:/mnt/bricks/gnode3a gnode4:/mnt/bricks/gnode4a abort Replace brick abort operation successful

To check the data migration status execute:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster volume replace-brick gvola gnode3:/mnt/bricks/gnode3a gnode4:/mnt/bricks/gnode4a status Current File = /usr/src/mongoDB/Makefile Number of files migrated = 302 Migration complete

The status command shows the current file being migrated along with the current total number of files migrated. After completion of migration, it displays Migration complete.

To commit the data migration:

File: gistfile1.sh ------------------ [root@gnode1 ~]# gluster volume replace-brick gvola gnode3:/mnt/bricks/gnode3a gnode4:/mnt/bricks/gnode4a commit replace-brick commit successful

The commit command completes the migration of data to the new brick.

Resources:

[1] http://www.gluster.org/community/documentation/

Pages

Deploying GlusterFS