Time to Look at System Maintenance and Replacing Equipment
Once a year, it’s important to review the hardware that’s installed—and also the levels of the software that’s installed on those servers. It’s important to stay as current as possible to ensure you have the best security and performance. It’s also critical to be on hardware and software that is still in service. This means paying attention to the server and HMC hardware and infrastructure, OSes, VIO servers, and server and I/O firmware.
Get to Current Hardware
Each server, HMC and other hardware has four dates associated with it: the announcement date, general availability date, withdrawal from marketing and the service discontinued date. As of March 31, 2019, all POWER6 servers will have their service discontinued. This means that you will no longer be able to get service unless you sign an extended service contract, which is typically very expensive.
Additionally, on September 30, 2019 a number of POWER7 servers will also have their service discontinued. These servers include the 8202-E4B and E4C (P720), 8205-E6B and E6C (P740), 8231-E2C (P730), 8233-E8B (P750) and the 9117-MMB and MMC (P770). Additionally, many blades have also been removed from service. Find the full list here. If you’re using any of these servers, now is a good time to migrate to POWER9 to get to a fully supported system with less expensive support.
First Steps
The first step is to take an inventory of the equipment and what’s running on it. A significant amount of the information can be obtained using the HMCScanner if you are using HMCs (hardware management consoles). The HMCScanner also works with IVM, FSM (Flex manager) and SDMC if you’re using one of those. From the output you’ll get a list of all the servers that the HMC can see, including model and serial number, the server firmware levels, the HMC software level and the OS level for the LPARs and VIO servers.
If you don’t have an HMC, you will have to login and get this information manually. I/O firmware levels are not provided by HMCScanner so you will have to login to get those. In AIX (or VIO as root) we use the lsmcode -A command to get that information.
Once you have the levels you can start working on a plan. The first step is to determine what’s currently out of service or about to be. Next, look at any readmes associated with any required software updates. Depending on how far back the software is this may require multiple staged updates. It’s highly likely that newer hardware will require updates to the software. Additionally, it’s important to cross check all the levels to ensure they’ll work together and will be supported. This is where FLRT (fix level recommendation tool) or FLRT Lite can be used to figure out what updates need to be applied. FLRT Lite provides links to all the end of support dates for various hardware and software products.
You can then download the operating system ones and look at the readme documents. In the readme you will either find a change history with a list of all the changes incorporated in these fixes or a link to the firmware history which lists the changes in every firmware release from your installed one to the one you’re going to. Once you have the list of potential levels then you must determine what combination to install. Typically, most people want to wait two to three months after a level is released before installing it unless it’s mandatory.
A typical order for updates would be the HMC, then the server firmware plus I/O firmware, the VIO LPARs and then the operating systems. But the readmes still need to be checked for interdependencies. When bringing in a new server then the HMC would be updated (or a new one brought in), the new server would be connected and firmware on it updated and new VIO servers would be installed. Then the LPARs can be migrated across using LPM (Live partition mobility) or by rezoning the storage to the new environment. For hardware we are concerned about the service discontinued date and for software it is the eoSPS (end of service pack support) date.
New HMC or Upgrade Current HMC
As of October 31, 2018 all HMC software levels prior to v8.8.7.0 are no longer supported (eoSPS). V8.8.7.0 (all service packs) will be eoSPS as of 8/31/2019. This means the recommendation is to update your HMC to 9.1 M910, M911, M920 or M921. These are all supported through March 31, 2020.
In order to upgrade to version 9, your current HMC must be a supported HMC and must be at v8.8.6.0 SP1 or higher. Version 9 isn’t supported on older HMCs and also does not support any POWER6 or older servers. Per the readme the following HMCs are supported:
X86: 7042-CR7, 7042-CR8, 7042-CR9, 7042-OE1 and 7042-OE2
Open Power: 7063-CR1
If you have an older HMC that has to be replaced, I recommend going to the POWER HMC (7063-CR1) as it has a longer lifecycle than the x86 and virtual HMCs. It’s also much faster than the x86 HMCs. Previously, some of the Linux only servers could not be attached to an HMC and had to be individually managed. Certain Linux-only servers (8335-GTH and GTX or 9006-12P or 22P) can now be attached to the POWER HMC for some functions. It’s a subset of the HMC functions, but it provides for HMC managed firmware updates and serviceability.
Please note that with the new HMC the classic GUI goes away. Additionally, if you have redundant HMCs then they both need to be upgraded. It is also recommended that your VIO servers are at v2.2.6.21 or higher if possible, although the new HMC will work with 2.2.3. The readme has a series of upgrade notes that covers these items plus some additional issues. It’s important to read this document for every level you plan to install.
As of version 8.8.7 the paths change for downloads from IBM. This is because the initial supported release for the POWER HMC was version 8.8.7 so it became important to differentiate between x86 and POWER HMC code.
Servers
Whether the server is new or not, IBM requires that you have a valid hardware maintenance agreement (HWMA) to download and install firmware and a valid software maintenance agreement (SWMA) to download and upgrade software. You can check entitlements on the IBM entitled software site under “my entitled software.” You’ll need the server model and serial number, e.g., 8205-e6d serial 06abcd1. For some products you’ll need all seven characters.
Server firmware levels can be checked using FLRT or you can go to Fix Central, provide the server model and serial number and it will list the firmware to be installed. To get the readme click on description—that file will provide information on the minimum HMC, VIO and operating system levels required for this firmware release. It also provides information on how to install the firmware. The best way to install firmware is through the HMC interface.
VIO Servers and Other LPARs
All levels of the VIO server prior to 2.2.6 will be eoSPS as of 11/30/2019. All levels prior to 2.2.5 went eoSPS on 12/31/2018. It is critical you keep your VIO servers up to date. The highest current level of the VIO server is 2.2.6.32. The latest version that came out in November 2018 is v3.1, which is an AIX 7.2 tl03 based VIO server. If you are installing a new VIO server then I recommend installing v3.1. If you are upgrading a current VIO server then I recommend upgrading to v2.2.6.32 and then planning a v3.1 upgrade after March so that the upgrade process has time to settle in. Ensure you read the readme files as the upgrade process may be a multistep process depending on what level VIO server you are running today.
For AIX, all levels prior to v7.1 are eoSPS. AIX 7.1 levels prior to tl04 are also eoSPS, although tl04 (all service packs) goes eoSPS 12/4/2019. AIX 7.1 tl05 is supported until 4/30/2022. AIX 7.2 levels prior to tl01 went eoSPS 12/31/2018. Tl01 goes eoSPS 11/30/2019, tl02 is 10/31/2020 and tl03 is 9/30/2021. If you’re upgrading AIX, I highly recommend going to AIX v7.2 tl03 with the latest service pack. This will minimize upgrades moving forward.
What level am I at?
If the HMCScanner cannot identify the OS levels, you’ll need to do it manually. To figure out what levels you should install, determine the currently installed levels by doing the following as root:
oslevel -s
The above will show you the oslevel as something like:
7100-01-04-1216
The above is AIX 7.1 tl01 sp4
For a VIO you run the above after entering oem_setup_env but you also run the following as padmin
ioslevel
2.2.3.52
For a VIO, oslevel will normally show something like: 6100-09-05-1524 with ioslevel showing something like 2.2.3.52
You can check the VIOS to NIM mapping table to find out how these should match.
lsmcode -A
The above will provide you with the firmware level for the server and also for any I/O that is attached to it i.e. fibre cards, network cards, disks, etc. These adapters will need updating regularly as IBM brings out mandatory updates from time to time. This applies especially to the 10Gb network and 8/16/32Gb fibre cards. For example, the 5735 dual port 8 GB fiber card has a mandatory update as of March 22, 2018 – the level needs to be at 210301. I/O adapter firmware levels can be found at Fix Central—typically you will need to download these from Fix Central and install them manually.
Before Starting
Always run errpt (AIX) or equivalent if this is an operating system. It’s important to check error logs prior to starting any update. There is no point in trying to update a system that has problems. Then take a backup: For AIX or the VIO this would be a mksysb to an external resource. The VIO server also requires that you do a viosbr backup first so that virtual resource definitions are saved. Then backup the HMC, which involves running a save upgrade data to the hard drive and then backing up the management console data. This second backup can be done to a USB key if you have one or to a remote FTP server. Use of the DVD is no longer supported and the new HMCs do not have DVDs. Once the backup is complete you are ready to go.
Prior to starting any firmware updates you may want to use LPM to move any affected LPARs to a new server, especially if you’re going to have to power cycle the server for a deferred or disruptive update.
HMC, Server and I/O Firmware
If you have an HMC connected to your server, then firmware is very easy to install. There are three types of installs: concurrent, deferred and disruptive. Concurrent updates can be installed with no downtime. Deferred updates can be installed but will not be activated until the system is powered off and on. Disruptive updates require an immediate power off and on. The readme will tell you what kind of updates these are and the HMC process will also flag the updates before you click on the final OK. I’ve seen times where it said updates were concurrent but there ended up being deferred updates, so it’s best to plan to do a power recycle on the server regardless of what the readme says.
When using the HMC to install firmware, you can also choose to have it install any I/O firmware at that time. It will catch some firmware but typically you will have to install any high-performance adapter firmware manually. Without an HMC, you need to download the updates and follow the manual install process which is always disruptive.
Summary
System maintenance has often been a process that gets pushed to the back of the priority list. None of us want to place a service call on a critical system only to find out it is out of support and we have to do an emergency upgrade or pay for support. With the number of server withdrawals and the associated software and HMC changes this is a good time to do an analysis of your current status and start budgeting for and planning those critical upgrades. In this article I have touched on the HMC, servers, VIO servers and client LPARs but don’t forget to include storage, switches and applications like Spectrum Scale or databases in your planning.
The key to a successful update is to have a well-planned update strategy and to be proactive and get ahead of the phasing out of equipment or software. It’s important to go through the readmes and incorporate their notes into the plan and to include steps in the plan for what to do when or if things go wrong.
IBM has provided a number of tools to make maintenance planning far simpler. These tools include HMCScanner, Fix Central, FLRT, FLRT Lite and FLRTVC. I recommend regularly running HMCScanner and then putting those levels into FLRT so you can find out about necessary upgrades before you have an issue. If you are running POWER7 or lower servers or your HMC is running software prior to v9 then it is time to update or replace them. Performing proactive replacements and upgrades can save significant time and money, especially if you’re able to consolidate into fewer servers and cores.
References
POWER6 and POWER7 withdrawals
http://www-01.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ssi/rep_ca/3/897/ENUS917-163/index.html&lang=en&request_locale=en
Fix Central
https://www-945.ibm.com/support/fixcentral/
Entitled Software
https://www.ibm.com/servers/eserver/ess/index.wss
FLRT (fix level recommendation tool)
http://www14.software.ibm.com/webapp/set2/flrt/home
FLRT Lite
http://www14.software.ibm.com/webapp/set2/flrt/liteHome
FLRT VC (FLRT Vulnerability checker)
http://www14.software.ibm.com/webapp/set2/flrt/vc
VIOS to NIM mapping table
http://www14.software.ibm.com/webapp/set2/sas/f/flrt/viostable.html
eoSPS Definition
http://www14.software.ibm.com/webapp/set2/sas/f/flrt/use.html#eosps