These are my notes on installing and configuring GFS from source RPMs. I haven’t worked with GFS in over a year, so I don’t know if this information is still accurate, but I’m posting it anyway in the hope that someone out there will find it useful. When following these instructions, your best bet is to run each command on each node before moving on to the next step (unless otherwise specified, or unless you know what you’re doing).
1.) Get the GFS and perl-Net-Telnet SRPMs from Redhat.
ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
ftp://ftp.redhat.com/pub/redhat/linux/updates/enterprise/3ES/en/RHGFS/SRPMS/
2.) Install the perl-Digest-HMAC and perl-Digest-SHA1 RPMs.
3.) Build and install the perl-Net-Telnet SRPM.
rpmbuild --rebuild perl-Net-Telnet-3.03-2.src.rpm rpm -Uvh /usr/src/redhat/RPMS/noarch/perl-Net-Telnet-3.03-2.noarch.rpm
4.) Each GFS node needs be running clock synchronization software to prevent unnecessary inode timestamp updates (which according to the manual, will impact performance severely), so you need to download and install the NTP RPM.
rpm -Uvh ntp-4.1.2-4.EL3.1.i386.rpm5.) Sync up your clock for the first time:
ntpdate 10.25.1.36
6.) Add the following lines to /etc/ntp.conf.
restrict pool.ntp.org mask 255.255.255.255 nomodify notrap noquery server pool.ntp.org
7.) Start ntpd.
/etc/init.d/ntpd start
8.) Verify that ntpd is syncing with your NTP server(s). When you do this, you need to make sure that your jitter values are in the lower single digits. They should definately not be 4000 (which means that NTP is not working at all).
ntpq -p9.) Make sure you have the kernel, kernel-smp, and kernel-source RPMs installed.
rpm -q kernel kernel-smp kernel-source10.) Install the GFS SRPM.
rpm -Uvh GFS-6.0.2-25.src.rpm11.) Check your current kernel version.
uname -a
12.) Open /usr/src/redhat/SPECS/gfs-build.spec and look for a line that starts with %define KERNEL_EXTRAVERSION. You may need to change this to match the “extraversion” of your kernel. You should also look for the line that says %define buildhugemem 1 and set it to 0 (unless you have a machine with >16gb memory with the hugemem kernel installed).
13.) Create a new SRPM with all the changes you made.
rpmbuild -bs /usr/src/redhat/SPECS/gfs-build.spec
14.) Build the GFS RPMs. Don’t forget to use the –target i686 option, or the SMP modules will not be installed.
rpmbuild --rebuild --target i686 /usr/src/redhat/SRPMS/GFS-6.0.2-25.src.rpm
15.) Install the GFS RPMs.
rpm -Uvh /usr/src/redhat/RPMS/i686/*6.0.2-25.i686.rpm
16.) Try manually loading the GFS modules into the kernel. If the modules are loaded succesfully, you should see them (along with all the other loaded kernel modules) in the output of lsmod.
depmod -a modprobe pool modprobe lock_gulm modprobe gfs lsmod
17.) At this point, the clustering software is installed, and simply needs to be configured. Now you need to create three config files (cluster.ccs, fence.ccs, and nodes.ccs) for the cluster configuration system (CCS). These files should be placed in a temporary directory by themselves on one node (I used /root/cluster). This is a fairly straightforward process, so I won’t repeat what chapter 6 of the Redhat GFS Administrators Guide already covers in detail.
18.) Once the CCS files have been created on one of the nodes, you should probably run a syntax check on them.
ccs_tool test /root/cluster
19.) Next, you need to create a “cluster configuration archive” (CCA) from the ccs files, and write it to a “cluster configuration archive device” (which is just a fancy name for a partition that all nodes have access to). A pool volume can be used for this, but I had the luxury of a 2.5TB iSCSI storage array, so just created a 2MB partition on that. Use the ccs_tool command to create the archive on the storage device of your choice. Note that ccs_tool writes these files in its own raw format, so there’s no need to format the partition. Also note that I was unable to create new CCS archives without having valid DNS records for all nodes.
ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1
20.) Tell Redhat’s init scripts where to find the ccs archive.
echo "CCS_ARCHIVE=\"/dev/iscsi/bus0/target0/lun0/part1\"" >/etc/sysconfig/gfs
21.) Now start the ccs daemons.
service ccsd start
22.) Start the lock_gulm server daemons.
service lock_gulmd start
23.) Create the GFS filesystems from one node.
gfs_mkfs -p lock_gulm -t Cluster1:gfs1 -j 8 /dev/iscsi/bus0/target0/lun0/part2
24.) Add your GFS filesystems to /etc/fstab with a fstype of gfs.
25.) Mount your GFS filesystems. Note that there is a known bug in some versions of GFS (related to mounting shared volumes) where node hostnames must be unique in the first 8 characters.
service gfs start
26.) Congratulations, you have a cluster! At this point, you should test moving files from individual nodes to the shared volume. All other nodes in the cluster should immediately be able to see these files. You might also try comparing the md5 sum of the file before it was moved to the md5 sum of the file after it was moved, just to make sure nothing weird is going on.
From node 1:
md5sum file.tgz cp file.tgz /mnt/volume
From node 2:
md5sum /mnt/volume/file.tgz
Thats it! For more info, RTFM! ;-)
Caveats
1.) I encountered a problem due to network latency in which iSCSI sessions were not consistantly being established before the GFS scripts tried to access the volumes. The ultimate solution was to first disable the init scripts…
chkconfig --level 0123456 iscsi off chkconfig --level 0123456 ccsd off chkconfig --level 0123456 lock_gulmd off chkconfig --level 0123456 gfs off
…then add the following to /etc/rc.local.
sleep 45 service iscsi start service ccsd start service lock_gulmd start service gfs start
2.) The fence_apc fencing method does not officially support the APC switch I was using, but I came up with a workaround (which can be found on the bug report I submitted to Redhat. This workaround was successful on the fence_apc script from version 6.0.0-1.2, but not with the one from 6.0.2-25. When upgrading, I needed to copy over the old fence_apc script (on the master lock server only):
cp /usr/src/redhat/SOURCES/gfs-build/bedrock/fence/agents/apc/fence_apc.pl /sbin/fence_apc
I just want to say these notes helped me to get things going nicely, Thanks a lot!
Question, why is there no fenced? I have been trying to figure out why manual fencing is not working properly and that seems to be part of the problem. As far as I can tell fenced is supposed to create a /tmp/fence_manual.fifo. Without fenced there is no fifo and no way to manually fence. Any thoughts? Oh, and yes, I already tried to talk my boss into buying an APC, might as well try to teach pigs to sing.
hmm, i dont remember anything about ‘fenced.’ i could be wrong, but i think lock_gulmd does the fencing. it just launches a perl script for your particular switch type. if you dont have an apc, im pretty sure there’s a way to write a script for whatever switch you have (assuming its managable via telnet/ssh/whatever).
hey, Im trying to do this now and When I do
[jason@tf1 SRPMS]$ uname -a
Linux tf1.localdomain 2.4.21-40.ELsmp #1 SMP Thu Feb 2 22:22:39 EST 2006 i686 i686 i386 GNU/Linux
[jason@tf1 SRPMS]$ rpmbuild -bs /usr/src/redhat/SPECS/gfs-build.spec
error: Failed build dependencies:
kernel-source = 2.4.21-40.ELsmp is needed by GFS-6.0.2.30-0
[jason@tf1 SRPMS]$
How do I get around this as Im running a smp kernel..
[jason@tf1 SRPMS]$ rpm -qa | grep -i kernel
kernel-2.4.21-4.EL
kernel-source-2.4.21-40.EL
kernel-pcmcia-cs-3.1.31-19
kernel-2.4.21-40.EL
kernel-smp-2.4.21-4.EL
kernel-utils-2.4-8.37.14
kernel-smp-2.4.21-40.EL
[jason@tf1 SRPMS]$