Upgrading Your Storage Scale Cluster, Part 1
In this two-part series, IBM Champion Jaqui Lynch shares the lessons she learned when upgrading from Spectrum Scale 5.1.7.1
I recently decided it was time to upgrade my Spectrum Scale cluster from 5.1.7.1 to the latest level. After doing some research and planning, I thought it might be helpful to share some of the things I discovered. In this article I will discuss things you should be aware of prior to the upgrade. In part 2 I will cover the actual upgrade process.
Getting the Code
The first step was figuring out what the latest level was and then getting the filesets for the upgrade. This was a learning experience in how to find those filesets. I went to Fix Central and searched on Spectrum Scale, and the highest level it showed was 5.1.9, but the only one for AIX was 5.1.7.1, which is the one I was running. I finally figured out that to find the latest you have to search on either GPFS or Storage Scale. The search on GPFS provides a list that you can choose from. If you select GPFS you get 3.2.1 to 3.5.0, and GPFS advanced provides 4.1.0. If you scroll down, you will find Spectrum Scale as an option which provides 4.1.1 to 5.1.9 and, finally, if you scroll to Storage Scale you can then find 5.1.8 to 5.2.1.
Storage Scale FAQ
Make sure to read the Storage Scale FAQ before trying any new commands. I had to open a case with IBM as I wanted to test the mmpstat command and could not find it. IBM told me to use mmcallhome to provide the information they wanted. I could not find that command either (same issue as mmpstat). It turns out that code is not available on AIX or Windows. In the FAQ, look at Table 36, which includes the feature exceptions for AIX and Windows. I interpreted this as listing things that were only on AIX and Windows; however, it is the opposite. This is the list of features that are not available on AIX and Windows. This includes the mmcallhome and mmpstat commands which I had really wanted to use.
Handling Problems With Spectrum Scale
Prior to doing any major cluster changes or upgrading, I always take two snaps—a regular AIX snap and a Scale snap (gpfs.snap). I take one of each before and one after so that I can provide them to IBM if I have problems.
To run the gpfs.snap you do the following (you can change /tmp to wherever you want to save the snap). On the primary node:
mkdir /tmp/snap_outdir
The above directory cannot be within a GPFS filesystem.
To collect gpfs.snap on all nodes with the default data, issue the following command:
gpfs.snap -a -d /tmp/snap_outdir
The -a flag directs gpfs.snap to collect data from all nodes in the cluster. This value is the default, so you don’t have to specify it. I usually do in case this changes in the future. You can also specify specific nodes if you don’t want the snap to include them all.
The snap should create a file something like:
/tmp/snap_outdir/all.???????.tar
I then rename the file so that it is clustername.pre.all.??????.tar. When I take the gpfs.snap after the upgrade, I change pre to all, and if I have to upload to IBM I add the case number to the font of the name. I recommend running a snap prior to the upgrade and another after just in case.
As I mentioned, I also run an AIX snap before and after the upgrade. First, I remove previous snaps:
snap -r
Then I take a snap using:
snap -ac
The above creates the snap in /tmp/ibmsupt and includes system dumps. If you specify -Z it will suppress dumps. Always include dumps unless IBM says not to.
I create a directory to hold the snaps:
mkdir /software/snaphold
Then I find the snap I just created:
ls -l /tmp/ibmsupt/*snap.pax*
The file will either end .Z or .gz depending on the O/S version so:
mv /tmp/ibmsupt/snap.pax.Z /software/snaphold/nodename.pregpfs521.snap.pax.Z
or
mv /tmp/ibmsupt/snap.pax.gz /software/snaphold/nodename.pregpfs521.snap.pax.gz
For a VIO server, you log in as padmin and run snap with no parameters. Check the name of the snap in /tmp/ibmsupt and rename as needed before uploading. Most likely:
mv /tmp/ibmsupt/snap.pax.gz /software/snaphold/nodename.pregpfs521.snap.pax.gz
Then take the same snap after you upgrade Storage Scale, but instead of pre use post in the name.
If you have a case open, then upload the snaps into the case after adding the case number to the front of the name.
Planning Your Upgrades
My cluster is fairly simple—there are four AIX nodes. One is what I call primary, two are filesystem managers and the fourth is for some specific applications workloads. There also used to be multiple Linux nodes in the cluster. However, they were removed as we no longer use them.
When planning for the upgrade you should check the supported upgrade paths to ensure that it is supported to upgrade directly to the new level, avoiding interim upgrades. In my case I was able to go from 5.1.7.1 to 5.2.1.1 with no issues.
Then you need to decide whether you will be doing an online or offline upgrade. An online upgrade lets you upgrade one node at a time. During that time the filesystems are available to the other nodes. However, you must upgrade all the nodes as quickly as possible because some features in the newer version become available on each node as soon as the node is upgraded, while other features are not available until you upgrade all participating nodes.
There are some limitations on when you can do online upgrades. For protocol nodes it depends on what services are enabled and in use. If you have multiple protocol nodes with different protocols, then the safest way to do this may be an offline upgrade. For NFS protocol nodes you can perform an online upgrade one node at a time if there are different NFS versions on the protocol nodes. For SMB protocol nodes all nodes running the SMB service must have the same version installed at any time. There are other limitations depending on the version of Storage Scale you are upgrading from and to, and also on which protocols you are using and, of course, the base operating system versions.
In my case, all my nodes were AIX 7.3.2.2 running Scale 5.1.7.1. The two application nodes run Samba to share out files while the two filesystem manager nodes take care of the I/O. I decided to perform an offline upgrade, which involves shutting down the entire cluster. The upgrade process is very similar to the online procedure with all nodes being done at the same time while the cluster is shut down.
Prior to upgrading make sure you are familiar with the following commands:
Mmlslicense
mmlscluster
mmlsconfig
mmlsmount
mmlsmgr
mmgetstate
mmstartup
mmshutdown
mmlsfs
mmdf
mmmount
mmumount
mmdiag
mmhealth
mmlsnsd
mmchconfig release=LATEST
mmchfs -V full (or mmchfs -V compat)
Up Next
In part 2 we will discuss performing the actual upgrade of Storage Scale. Prior to the upgrade I pre-document all the steps and then modify the document, saving any outputs. I also make sure I have the cluster well documented and take multiple backups. Finally, I use commands like mmhealth, lspath and errpt to check that I am upgrading a healthy cluster that has no issues.
References
Storage Scale FAQ
https://www.ibm.com/docs/en/STXKQY/pdf/gpfsclustersfaq.pdf?cp
Storage Scale Snap
https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=details-using-gpfssnap-command
https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=reference-gpfssnap-command
AIX Snap
https://www.ibm.com/support/pages/working-ibm-aix-support-collecting-snap-data
Storage Scale 5.2.1.1 Readme
https://www.ibm.com/support/pages/node/7170420
Storage Scale Upgrading
https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=upgrading
https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=upgrading-completing-upgrade-new-level-storage-scale#mignew
Supported Upgrade Paths
https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=upgrading-storage-scale-supported-upgrade-paths
mmigratefs command
https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=reference-mmmigratefs-command
Reverting to Previous Storage Scale Levels
https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=rplss-reverting-previous-level-gpfs-when-you-have-not-issued-mmchconfig-releaselatest
https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=rplss-reverting-previous-level-gpfs-when-you-have-issued-mmchconfig-releaselatest
Upgrading non-protocol Nodes
https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=upgrading-storage-scale-non-protocol-linux-nodes
What’s New in Storage Scale 5.2.0?
https://www.spectrumscaleug.org/wp-content/uploads/2024/07/SSUG24ISC-Whats-new-in-Storage-Scale-5.2.0.pdf