HMC and VIO Update Tips
IBM Champion Jaqui Lynch talks about HMC and VIO upgrade lessons she has learned in 2024 so far
2024 has been a busy year so far with many HMC and VIO updates. During that time, I have run into a few issues that I thought I would share so that others don’t have to open cases with IBM for the same things.
Environment
When working on a pair of VIO servers, it is easy to get confused as to which one you are on. To avoid this, I set the following in /etc/environment for vio1 and change it to vio2 when I am on vio2:
EDITOR=vi
PS1="vio1$: "
Then after I type in oem_setup_env, I type:
export PS1="vio1#: "
This makes it easy for me to distinguish which VIO I am working on. Since I copy and paste everything I do into documentation, it also makes it clear where I put the commands.
Using Snap With a VIO Server
Whenever I run into a problem on a VIO server I automatically take a snap on both VIO servers, and I upload the snaps as soon as I open the case. This saves me time, as IBM will nearly always ask for a snap when you report an issue.
Find details on using Testcase for a VIO here, and details for using snap here.
The first step for me is to clear out any old snaps—this is done as root:
#snap -r
Then as padmin you run the snap command:
$snap
The snap output will be in /home/padmin and you will need to rename it to the case and VIO name before uploading it.
Typically this is done as follows:
mv /home/padmin/snap.pax.Z
/home/padmin/TS<xxxxxxxxx>.VIOS_partition_name.snap.pax.Z
You can then upload it to IBM—I typically download it to my desktop and then use the upload option to upload it directly into the case, but you can also follow this procedure from the IBM Support pages.
Upgrades to VIO 4.1.0.10
I have been doing a lot of upgrades lately from VIO servers at 3.1.3.14 and higher. During that time, I have run into a couple of issues that you can avoid.
I always check errpt and I also make sure the current running configuration has been saved into the profile just in case someone made dynamic changes that have not been saved. I had one instance where fibre adapters had been added dynamically but they were not in the profile. When I reactivated the VIO we lost them. Luckily, I had an HMCScanner report and had saved the output of “lsmap -all -npiv” so was able to quickly resolve it.
Prior to the upgrade I fully document the VIO server noting how the SEA, etc are configured and saving the output to at least the following commands:
As padmin:
ioslevel
lsmap -all -npiv
lsmap -all -npiv | grep vfchost
lsmap -all -npiv | grep fcs
lsmap -all -npiv | grep fcs | wc -l
lsmap -all | grep vhost
lsvopt
lspv -size
lspv
lsnports
lsrep
As root:
ifconfig -a
Note down all the settings such as IP, gateway and subnetmask.
lsdev -C | grep Shared
lsattr -El ent6 | grep ent
lsattr -El ent5 | grep ent
In the commands above, change ent6 to the SEA adapter and ent5 to the aggregate adapter.
I have also been updating VIO servers that have not been rebooted for more than two years, which is terrifying. After all the checks above I rewrote the boot image and bootlist, took a clone and a mksysb, then rebooted them before making any changes. That way I knew I was starting with a good VIO server.
The mksysb image for 4.1.0.10 came out in November 2023. I did multiple upgrades with no problem. Then in March I could not login after the upgrade as the padmin password had expired. It turns out that the mksysb image has a change to maxage and maxexpired that means the password will expire if it is more than three months old. This issue is now described in the IBM Support pages.
If you run into the problem after the upgrade, you can fix it from the HMC using:
viosvrcmd -m ServernameatHMC --id 1 -c "chuser -attr maxage=0 padmin"
viosvrcmd -m ServernameatHMC --id 1 -c "chuser -attr maxexpired=-1 padmin"
Change ServernameatHMC to the actual server name and the ID from 1 to whatever the LPAR ID is for the VIO you are having the issue with. There are now patches out that will avoid this problem and they are specific to the release you are upgrading from. So far, the list I have is:
3.1.3.14 patch
IJ50453m3a.240318.epkg.Z
3.1.3.21 patch
IJ50453m4a.240313.epkg.Z
3.1.4.10 patch
IJ50326m5a.240318.epkg.Z
3.1.4.31 patch
IJ50326s7a.240313.epkg.Z
3.1.4.21
IJ50326m6b.240313.epkg.Z
You may have to request these from IBM. For a 3.1.4.13 the patch will show in “emgr -P” as:
ios.cli.rte installp IJ50453m3a
After the upgrade you still need to fix maxage and maxexpired or 3 months from now you won’t be able to login again. The way to do this is as padmin:
chuser -attr maxage=0 padmin
chuser -attr maxexpired=-1 padmin
If you then look at /etc/security you will see the following in the padmin stanza:
padmin:
admin = false
default_roles = PAdmin,CacheAdm
core_path = on
core_pathname = /home/ios/logs
maxage = 0
maxexpired = -1
Before upgrading there are a couple more things you should do to avoid additional problems. The first is to make sure there are no NFS filesystems that are mounted. If there are, unmount them before starting.
There is also an issue with autoviosbr backups causing your upgrade to fail with the following error:
Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE /usr/ios/cli/ioscli/ at /usr/ios/sbin/viosupg.pl line 10846.
To avoid this run “viosbr -nobackup” as padmin before the upgrade.
The last issue I ran into eith the viosupgrade seems to be specific to upgrading from 3.1.3.14. I got the following error on the viosupgrade:
Unrecognized escape \T passed through at /usr/ios/sbin/viosupg.pl line 969.
Welcome to viosupgrade tool.
.......
Migration upgrade request initiated.
lsfs: 0506-915 No record matching /home/padmin/N/A was found in /etc/filesystems.
Migration upgrade failed.
It turns out that the issue was that I create new system dump files as the default lg_dump is too small. I did not have this issue on any of my upgrades at later versions of the VIO. To get 3.1.3.14 to upgrade I had to remove those dump LVs from rootvg. I did this as follows:
sysdumpdev -P -p /dev/sysdumpnull
sysdumpdev -P -s /dev/sysdumpnull
rmlv lv_dumplv1
rmlv lv_dumplv2
After that the upgrade went through smoothly although I still saw the following message:
Unrecognized escape \T passed through at /usr/ios/sbin/viosupg.pl line 969.
At all other levels I did not have this problem.
emgr_check_ifixes
This is a new command in AIX 7.3, and it lets you check for fixes at IBM and even download them.
Check for fixes:
emgr_check_ifixes
Check for fixes and download them to /tmp/ifixes:
emgr_check_ifixes -D -P /tmp/ifixes
This command failed on every AIX 7.3 system I ran it on with:
ERROR: HTTP connection failed. Data cannot be extracted
ERROR: failed to download CRL, http status log in /tmp/ifix/crl.der
This is a known problem described on this IBM Support page.
There is now a combo fix for it called IJ49378m1d.240206.epkg.Z. You may have to request it from IBM. It is applied with emgr and no reboot is required.
To amuse myself I ran both flrtvc and emgr_check_ifixes to compare the results.
emgr_check_ifixes found the following on my patches system:
openssh_fix15 This won’t go on as fix16 is on
openssh_fix16 This was already on
openssl_fix40 This was already on
kernel_fix7 New
flrtvc found the following:
openssh_fix15 This won’t go on as fix16 is on
curl_fix3 This won’t go on as fix4 is on
kernel_fix7 New
It was strange that emgr_check_ifixes flagged patches that were already on and that it had identified as being on. The latest flrtvc is 0.8.8.
Other Patches and Updates
After the upgrade Java, SSH and SSL are all backlevel. They install at:
Java8 8.0.0.800
SSL 3.0.10.1001
SSH 8.1.112.2000
The latest patched levels are:
Java8 8.0.0.821
SSL 3.0.10.1002
SSH 9.2.112.2000
You can download the Java patches from Fix Central. You can also download SSH and SSL from the web download site.
You should also get the updated rpm, perl and python from the web download site.
rpm.rte 4.18.1.2003
After updating rpm, you should run updtvpkg:
perl.rte 5.34.1.6
python3.9.base 3.9.18.2
Finally, there are security patches you need to download from the security site and apply. As of today, these are:
curl_fix4.tar
openssh_fix16.tar
openssl_fix40.tar
sendmail_fix4.tar
kernel_fix7.tar
I have not yet put on kernel_fix7 so my emgr shows the following patches prior to that:
# emgr -P
PACKAGE INSTALLER LABEL
======================================================== =========== ==========
oss.lib.libcurl installp 46218ma
openssh.base.client installp 92112ma
openssh.base.server installp 92112ma
openssl.base installp 301002sa
bos.net.tcp.sendmail installp IJ50428s1a
# emgr -l
ID STATE LABEL INSTALL TIME UPDATED BY ABSTRACT
====== ================ ================= ========== ======================================
1 S 46218ma 04/20/24 19:22:40 ifix for libcurl CVE
2 S 92112ma 04/20/24 19:23:04 ifix for openssh Jan CVEs
3 S 301002sa 04/20/24 19:23:47 ifix for openssl CVEs
4 S IJ50428s1a 04/20/24 19:24:11 IJ50428 for AIX 7.3 TL2 SP1
Logging
By default, there is no real logging on the VIO servers, so I add logging. I create a filesystem called /usr/local/logs and create four files in it (mailog, syslog, Infolog, messages). Then I add the following to /etc/syslog.conf and stop and restart syslogd.
#
mail.debug /usr/local/logs/mailog rotate size 2m files 10 compress
*.emerg /usr/local/logs/syslog rotate size 2m files 10 compress
*.alert /usr/local/logs/syslog rotate size 2m files 10 compress
*.crit /usr/local/logs/syslog rotate size 2m files 10 compress
*.err /usr/local/logs/syslog rotate size 2m files 10 compress
auth.notice /usr/local/logs/infolog rotate size 2m files 10 compress
*.info /usr/local/logs/messages rotate size 2m files 10 compress
Fibre Adapters
Some of the servers I work on have fibre ports that are not connected or they have the 4-port 1Gb/10Gb network card when the 10Gb ports show as both ent and fcs ports. I had unconfigured them ages ago but after the upgrade they came back as configured and I was getting link errors. So, I unconfigured them again as follows:
As padmin:
rmdev -dev fcs4 -recursive -ucfg
rmdev -dev fcs5 -recursive -ucfg
chdev -dev fscsi4 -attr autoconfig=defined
chdev -dev fscsi5 -attr autoconfig=defined
I also ran into an issue where certain fibre adapters in IBM Power Systems are incorrectly registered as targets instead of initiators in the SAN fabric. This is explained here.
It applies to the following adapters:
FC EN0Y &EN12; CCIN ENOY
FC EN0F & EN0G; CCIN 578D
FC 5708 & 5270; CCIN 2B3B
FC EN1E, EN1F; CCIN 579A
FC EN1J, EN1K; CCIN 579C
FC EN1G, EN1H; CCIN 579B
And it shows in errpt as:
DC73C03A 1020150323 T S fscsi7 SOFTWARE PROGRAM ERROR
DC73C03A 1020150323 T S fscsi3 SOFTWARE PROGRAM ERROR
IBM had me do the following, but you should check with them first if you see this issue:
chdev -l fcs# -a sw_prli_rjt=yes -P
bosboot -ad /dev/ipldevice
reboot
Remirror
When you upgrade to VIO 4.1.0.10 you must have a spare disk that is cleared for it to do the upgrade on. Usually that means you break your VIO mirror of rootvg to do the upgrade. I normally wait a week and then remirror so that my rootvg is protected.
Summary
As you can see, I have had an interesting time with all these upgrades. I hope the tips and notes above help you to avoid some of the things I have run into. Also don’t forget to run an HMCScanner report before and after doing any updates. The latest version is 0.11.54 as of March 21, 2024.