I was interested in setting up persistent shared storage for some docker containers on my little Pi cluster. My research led me to GlusterFS using replicas. I used these sources:
- Gluster Docs
- Tutorial: Create a Docker Swarm with Persistent Storage Using GlusterFS
- Making docker swarm redundant
- How to Setup Raspberry Pi SSH Keys for Authentication
Gluster’s docs describes their product as “a scalable, distributed file system that aggregates disk storage resources from multiple servers into a single global namespace.” What this means for me is a shared persistent storage space to save data and config files for my docker containers. Any changes to the mounted glusterfs drive is replicated to all the peer bricks. In this way it doesn’t matter which cluster node a container uses, all will have access to the same data.
I added three Samsung Plus 32GB USB 3.1 Flash Drives to FrankenPi for some additional high speed storage to host the gluster data. My host names are: picluster1, picluster2, picluster3. The operating system is Raspberry Pi OS. I’m using the Buster version released 2020-08-20
- Update your software with the apt command:
sudo apt update && sudo apt upgrade -y
- Install and configure GlusterFS on each server within the swarm.
sudo apt install software-properties-common glusterfs-server -y
- Start and enable GlusterFS with the commands:
sudo systemctl start glusterd
sudo systemctl enable glusterd
- If you haven’t already done so, you should generate an SSH key for all nodes and copy them to the other nodes.
See: How to Setup Raspberry Pi SSH Keys for Authentication
Use these sections:
Generating SSH Keys on Linux based systems
Copying the Public Key using SSH Tools
Verify the results by trying to connect with the other nodes, you should be able to login without a password after keys are installed. It makes life simpler when moving between nodes:
- Probe the nodes from the master node only (picluster1). This will add the other nodes as Gluster peers. The next command verifies the peers are listed as part of the storage pool. Then we exit root:
gluster peer probe picluster2;gluster peer probe picluster3;
gluster pool list
UUID Hostname State
02c0656a-0637-4889-a185-264b56d6b24f picluster3 Connected
2a5c475a-23de-455b-a6a9-45f08e855b3c picluster2 Connected
bcb63335-b76f-459e-8195-352f1c0e719d localhost Connected
- Format the USB drives on each cluster node.
Start by checking the device names so that you format the correct device with the lsblk command.
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 29.9G 0 disk
mmcblk0 179:0 0 29.7G 0 disk
├─mmcblk0p1 179:1 0 256M 0 part /boot
└─mmcblk0p2 179:2 0 29.5G 0 part /
The mmcblk0 device is the micro SD card with the OS partitions, so sda is my target device. If you have other storage devices connected to one of the cluster nodes like I do the results will be different (sdb, sdc, etc.). You should repeat the same command on each device to prevent a horrible mistake.
Format the CORRECT device:
sudo mkfs.xfs -i size=512 /dev/sda -f
- Create a mount point for the USB drives on each node. Increment the number at the end so that you can tell the devices apart. The storage devices are not interchangeable, unless you re-format and start over:
sudo mkdir /gluster/bricks/1 -p
- Add the device parameters to /etc/fstab to make it permanent, then mount it. Make sure you increment the number appropriately on each node:
sudo su echo '/dev/sda /gluster/bricks/1 xfs defaults 0 0' >> /etc/fstab
mount -a exit
- Create the volume across the cluster. This is only run from one node, picluster1, in my use case:
sudo gluster volume create rep_swarm_vol replica 3 picluster1:/gluster/bricks/1 picluster2:/gluster/bricks/2 picluster3:/gluster/bricks/3 force
Let me break down the command:
sudo – Assume root privileges
gluster volume create – Initialize a new gluster volume
rep_swarm_vol – Arbitrary volume name
replica 3 – The type of pool and number of peers in the storage pool.
picluster1:/gluster/bricks/1 – The host and storage path for each peer.
force – It’s required for the version of Gluster support by the Raspberry Pi.
You can vary the number of replicas if you a different number of nodes. The host and path must reflect your setup if you’ve deviated from what was given above.
- Start the volume. This is ran on one node:
sudo gluster volume start rep_swarm_vol
- Create a mount point for the replicated gluster volume on each node, add it to fstab to make it permanent, and mount the volume:
mkdir -p /mnt/glusterfs
echo 'localhost:/rep_swarm_vol /mnt/glusterfs glusterfs defaults,_netdev,backupvolfile-server=localhost 0 0' >> /etc/fstab
mount.glusterfs localhost:/rep_swarm_vol /mnt/glusterfs
- Set permissions and exit sudo. I use the docker group, but root:root will probably work for most uses:
chown -R root:docker /mnt/glusterfs
- Verify all is well. The df -h command should show you backend storage brick, as well as the replicated storage volume, similar to what I’ve highlighted below:
Filesystem Size Used Avail Use% Mounted on
/dev/root 29G 4.5G 24G 17% /
devtmpfs 1.8G 0 1.8G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 17M 1.9G 1% /run
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/mmcblk0p1 253M 54M 199M 22% /boot
/dev/sda 30G 408M 30G 2% /gluster/bricks/1
tmpfs 385M 0 385M 0% /run/user/1000
localhost:/rep_swarm_vol 30G 717M 30G 3% /mnt/glusterfs
You should be able to create new files in the /mnt/glusterfs directory and they’ll show up in the same directory on each node.
For example, you want persistent storage for a MySQL database. In your docker YAML files, you could add a section like this:
- type: bind